(DI-2008) Calculation of the InfoProvider Usage
The number of queries for a whole InfoProvider is calculated as a sum of query executions that accessed an InfoProvider.
The main source of this information is stored in /DVD/QS tables, which contain the extended query statistics collected from the system. This information is gathered by Datavard Insights query statistics enhancement.
The number of queries that were run on an InfoProvider data part is calculated as follows:
- For each query that was run on InfoProvider directly:
- The system checks a selection of characteristics and only those InfoProvider data parts that belong to this selection, are marked as read by the query.
- If the time characteristic wasn't found, all data parts are marked as read by a query.
- For each query that was run on MultiProvider, SPO, compound InfoProvider:
- The data distribution analysis processes only child InfoProviders, if standard query statistics information (OLAP) is turned on during a query run and the query run didn’t use a cache.
- If the 0INFOPROV selection was used in a query, then the data distribution analysis processes only the child InfoProviders that belong to this selection. Further processing is the same as by a query that was run on an InfoProvider directly.
You can customize, whether you want to calculate the weighted average of data groups that include more values (e.g., data group <=2013 includes (2013, 2012 etc.)) or perform a simple sum of them, which is used by default.
Please note that if a data group (e.g. <=2013) contains data from multiple years, the InfoProvider usage is calculated as a simple sum of these years. This can cause the data group usage is greater than usage of the whole InfoProvider.
For more information about how to use the weighted average instead of a sum, please see the text below.
In the following example are listed reasons, why you can use weighted average for InfoProvider data groups.
Example:
The InfoProvider: CO_AB was analyzed by using the 0CALYEAR characteristic. As a result InfoProvider data was divided into the following values:
- Data with 0CALYEAR = 2014, size = 5 GB, reporting usage = 150
- Data with 0CALYEAR = 2013, size = 15 GB, reporting usage = 85
- Data with 0CALYEAR = 2012, size = 18 GB, reporting usage = 60
- Data with 0CALYEAR = 2011, size = 14 GB, reporting usage = 37
- Data with 0CALYEAR = 2010, size = 563 GB, reporting usage = 2
In this example is displayed a Data distribution analysis that looks as follows:
>=2018, 2017, 2016, 2015, 2014, <=2013
In our example the total number of queries executed on top of CO_AB InfoProvider is 162. When we calculate the sum of queries of data parts for the data group <=2013, we would get the value 184. As each query can access 0..N data parts of InfoProvider. If we use the weighted average, we will obtain the following number.
w.a. (( 15 * 85 ) + ( 18 * 60 ) + ( 14 * 37 ) + ( 563 * 2 )) / ( 15 + 18 + 14 + 563 ) = 6.556
Value ~6 means that data group <=2013 of InfoProvider data has a really low reporting usage, so the potential of this InfoProvider to be archived is high.
How to switch on weighted average
You can choose between the option to perform the calculation based on weighted average or based on the sum of InfoProvider data parts. By default the system uses the sum, but you can change this in expert settings.
Enter the transaction /DVD/RL_SETT_EXPERT and fill in the the tool name HM_BW and execute.
If you want to use the weighted average, change the parameter HM_BW_DATA_GROUP_AGG_WEIGH_AVG to X.
Results are going to be recalculated after next execution of the collector. (The Monitoring profile HMBW_1DAY)
Why the number of InfoProvider usage can be higher than sum of its data groups
Sometimes it can happen that we obtain the following record as a result of the Data distribution analysis:
InfoProvider | Type | Split time characteristic | InfoProvider size | InfoProvider usage | Usage <=2013 | Usage 2014 | Usage 2015 | Usage 2016 | Usage 2017 | Usage >=2018 | Size <=2013 | Size 2014 | Size 2015 | Size 2016 | Size 2017 | Size >=2018 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ZCUBE_XY | INFOCUBE | 0FISCYEAR | 144 | 102 | 0 | 0 | 0 | 0 | 0 | 0 | 76 | 68 | 0 | 0 | 0 | 0 |
We can see the size was successfully distributed into data groups (Split time characteristic was identified). However, the usage for the whole InfoProvider is 102, but these 102 usages weren't distributed to InfoProvider's data groups. This can have several reasons:
Reason 1: Empty data groups cannot have any usage
This InfoProvider has data only from the data groups 2014 and <=2013. Therefore, data could be added only to these two data groups.
We are analyzing only online data size, therefore if the particular data group has been already archived, the size for this group will be 0.
Reason 2: Selections that were used during the execution of a query contain a filter for a time characteristic which points to an empty or not existing data group.
Example:
An execution of a query with the following selections for the above InfoProvider:
Object | Sequence | Sign | Option | Low | High |
---|---|---|---|---|---|
0CO_AREA | 1 | I | EQ | 1000 | |
0CO_AREA | 2 | I | EQ | 2000 | |
0FISCYEAR | 1 | I | EQ | K22016 | |
ZMONTH | 1 | I | EQ | 2010001 |
What we can see from the above filters, is that user tried to read data from 2016, but the InfoProvider ZCUBE_XY doesn't have any data in 2016 data group. Therefore, the usage has the value 0 in the record. If you want to know, why the object ZMONTH wasn't used, you can check the section Additional Time Characteristics.