The number of queries (reporting usage) for whole InfoProviders are calculated as summation of query executions that accessed this InfoProvider.
The main source of this information can be found in /DVD/QS tables where the collected query statistics (extended) from system resides. This information is gathered by Datavard Insights query statistics enhancement which is part of Datavard Insights product transport. Number of queries (reporting usage) that were run on InfoProvider data part is calculated as follows:
- For each query that was run on InfoProvider directly:
- Characteristic selections are checked and for all data parts of InfoProvider only those that belong in the selection are marked as read by query.
- If no relevant characteristic selection was used, all data parts are marked as read by the query.
- For each query that was run on Multiprovider, SPO, Compounded InfoProvider:
- If standard query statistics information is available for query and query didn’t use cache only InfoProviders that are specified as partial InfoProvider are processed by next steps.
- Further checks are performed if 0INFOPROV selection was used in the query id. If yes only child InfoProviders that belong to this selection are processed. Further processing is same as for direct inforpovider query read.
If is possible to customize whether you want to calculate weighted average for data groups that include more values (e.g. data group <=2013 includes (2013, 2012,..)) or do simple summation for them, that is used by default.
In following example you can find reason why you could use weight average for InfoProvider data groups that includes more values.
Example:
We have InfoProvider: CO_AB that was analyzed using 0CALYEAR characteristic. As result InfoProvider data were distributed into following values:
- Data with 0CALYEAR = 2014, size = 5 GB, reporting usage = 150
- Data with 0CALYEAR = 2013, size = 15 GB, reporting usage = 85
- Data with 0CALYEAR = 2012, size = 18 GB, reporting usage = 60
- Data with 0CALYEAR = 2011, size = 14 GB, reporting usage = 37
- Data with 0CALYEAR = 2010, size = 563 GB, reporting usage = 2
We are using STANDARD data distribution analysis that looks following:
>=2018, 2017, 2016, 2015, 2014, <=2013
In our example total number of queries executed on top of CO_AB InfoProvider was 162. If we use sum of reporting usage of data parts this would give us for data group <=2013 value 184. As each query can access 0..N data parts of InfoProvider. If we use weighted average we will get following number.
w.a. (( 15 * 85 ) + ( 18 * 60 ) + ( 14 * 37 ) + ( 563 * 2 )) / ( 15 + 18 + 14 + 563 ) = 6.556
Value ~6 means that data group <=2013 of the InfoProvider data have really low reporting usage so the potential of this InfoProvider for archiving is high.
In new Data Distribution analyses weight average is used for calculation of data groups reporting usage (as they can cumulate multiple data parts) while InfoProvider total usage is analyzed separately.
How to switch on weighted average
You have option to switch between calculation of weighted average and summation for InfoProvider data parts. By default summation is used but you can switch it in expert settings.
Go to transaction /DVD/RL_SETT_EXPERT and fill tool name HM_BW and execute
Change parameter HM_BW_DATA_GROUP_AGG_WEIGH_AVG to X if you want to use weighted average
Results will be recalculated after next data distribution collector execution.
Why number of InfoProvider usage can be higher than summation of it data parts?
Some times it can happen situation where we have following result record:
InfoProvider | Type | Split time characteristic | InfoProvider size | InfoProvider usage | Usage <=2013 | Usage 2014 | Usage 2015 | Usage 2016 | Usage 2017 | Usage >=2018 | Size <=2013 | Size 2014 | Size 2015 | Size 2016 | Size 2017 | Size >=2018 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ZCUBE_XY | INFOCUBE | 0FISCYEAR | 144 | 102 | 0 | 0 | 0 | 0 | 0 | 0 | 76 | 68 | 0 | 0 | 0 | 0 |
We can see that size was successfully distributes into data parts (Split time characteristic was identified) but usage for whole InfoProvider is 102, question is why these 102 usages wasn't distributed to InfoProvider's data parts. This can have multiple reasons:
Reason 1: Empty data parts cannot have any usage
What we can say about this InfoProvider is that it have data only from data part 2014 and <=2013. Therefore possible usage of data could be found only in those two data parts no others.
Note that we are analyzing only online data size, therefore if particular data part was already archived, size for part will be 0 in result.
Reason 2: Selections / filters that was used during query execution contain filter for time characteristic but it point to empty data part
Example:
Lets have query execution with following filters for particular InfoProvider:
Object | Sequence | Sign | Option | Low | High |
---|---|---|---|---|---|
0CO_AREA | 1 | I | EQ | 1000 | |
0CO_AREA | 2 | I | EQ | 2000 | |
0FISCYEAR | 1 | I | EQ | K22016 | |
ZMONTH | 1 | I | EQ | 2010001 |
What we can read from filters is that user try to read data from 2016 but InfoProvider ZCUBE_XY doesn't have any data in 2016 data part, therefore we cannot add usage there. If you are wondering why object ZMONTH wasn't use read following documentation.