The number of queries (reporting usage) for whole InfoProviders are calculated as summation of query executions that accessed this InfoProvider.

The main source of this information can be found in /DVD/QS tables where the collected query statistics (extended) from system resides. This information is gathered by Datavard Insights query statistics enhancement which is part of Datavard Insights product transport. Number of queries (reporting usage) that were run on InfoProvider data part is calculated as follows:

For each query that was run on InfoProvider directly:
- Characteristic selections are checked and for all data parts of InfoProvider only those that belong in the selection are marked as read by query.
- If no relevant characteristic selection was used, all data parts are marked as read by the query.
- For each query that was run on Multiprovider, SPO, Compounded InfoProvider:
  - If standard query statistics information is available for query and query didn’t use cache only InfoProviders that are specified as partial InfoProvider are processed by next steps.
  - Further checks are performed if 0INFOPROV selection was used in the query id. If yes only child InfoProviders that belong to this selection are processed. Further processing is same as for direct inforpovider query read.

If is possible to customize whether you want to calculate weighted average for data groups that include more values (e.g. data group <=2013 includes (2013, 2012,..)) or do simple summation for them, that is used by default.

In following example you can find reason why you could use weight average for InfoProvider data groups that includes more values.

Example:

We have InfoProvider: CO_AB that was analyzed using 0CALYEAR characteristic. As result InfoProvider data were distributed into following values:

Data with 0CALYEAR = 2014, size = 5 GB, reporting usage = 150
Data with 0CALYEAR = 2013, size = 15 GB, reporting usage = 85
Data with 0CALYEAR = 2012, size = 18 GB, reporting usage = 60
Data with 0CALYEAR = 2011, size = 14 GB, reporting usage = 37
Data with 0CALYEAR = 2010, size = 563 GB, reporting usage = 2

We are using STANDARD data distribution analysis that looks following:

>=2018, 2017, 2016, 2015, 2014, <=2013

In our example total number of queries executed on top of CO_AB InfoProvider was 162. If we use sum of reporting usage of data parts this would give us for data group <=2013 value 184. As each query can access 0..N data parts of InfoProvider. If we use weighted average we will get following number.

w.a. (( 15 * 85 ) + ( 18 * 60 ) + ( 14 * 37 ) + ( 563 * 2 )) / ( 15 + 18 + 14 + 563 ) = 6.556

Value ~6 means that data group <=2013 of the InfoProvider data have really low reporting usage so the potential of this InfoProvider for archiving is high.

In new Data Distribution analyses weight average is used for calculation of data groups reporting usage (as they can cumulate multiple data parts) while InfoProvider total usage is analyzed separately.

How to switch on weighted average

You have option to switch between calculation of weighted average and summation for InfoProvider data parts. By default summation is used but you can switch it in expert settings.

Go to transaction /DVD/RL_SETT_EXPERT and fill tool name HM_BW and execute

Change parameter HM_BW_DATA_GROUP_AGG_WEIGH_AVG to X if you want to use weighted average

Results will be recalculated after next data distribution collector execution.

Why number of InfoProvider usage can be higher than summation of it data parts?

Some times it can happen situation where we have following result record:

InfoProvider	Type	Split time characteristic	InfoProvider size	InfoProvider usage	Usage <=2013	Usage 2014	Usage 2015	Usage 2016	Usage 2017	Usage >=2018	Size <=2013	Size 2014	Size 2015	Size 2016	Size 2017	Size >=2018
ZCUBE_XY	INFOCUBE	0FISCYEAR	144	102	0	0	0	0	0	0	76	68	0	0	0	0

We can see that size was successfully distributes into data parts (Split time characteristic was identified) but usage for whole InfoProvider is 102, question is why these 102 usages wasn't distributed to InfoProvider's data parts. This can have multiple reasons:

Reason 1: Empty data parts cannot have any usage

What we can say about this InfoProvider is that it have data only from data part 2014 and <=2013. Therefore possible usage of data could be found only in those two data parts no others.

Note that we are analyzing only online data size, therefore if particular data part was already archived, size for part will be 0 in result.

Reason 2: Selections / filters that was used during query execution contain filter for time characteristic but it point to empty data part

Example:

Lets have query execution with following filters for particular InfoProvider:

Object	Sequence	Sign	Option	Low
0CO_AREA	1	I	EQ	1000
0CO_AREA	2	I	EQ	2000
0FISCYEAR	1	I	EQ	K22016
ZMONTH	1	I	EQ	2010001

What we can read from filters is that user try to read data from 2016 but InfoProvider ZCUBE_XY doesn't have any data in 2016 data part, therefore we cannot add usage there. If you are wondering why object ZMONTH wasn't use read following documentation.

(DI-1802) Appendix C: How is calculated InfoProvider reporting usage

How to switch on weighted average

Why number of InfoProvider usage can be higher than summation of it data parts?