(DI-1705) How is calculated InfoProvider reporting usage

The number of queries (reporting usage) for whole InfoProviders are calculated as weight average of reporting usage of their respective data parts if there are any.

The main source of this information can be found in /DVD/QS tables where the collected query statistics (extended) from system resides. This information is gathered by Datavard Insights query statistics enhancement which is part of Datavard Insights product transport. Number of queries (reporting usage) that were run on InfoProvider data part is calculated as follows:

  • For each query that was run on InfoProvider directly:
    • Characteristic selections are checked and for all data parts of InfoProvider only those that belong in the selection are marked as read by query.
    • If no relevant characteristic selection was used, all data parts are marked as read by the query.
    • For each query that was run on Multiprovider, SPO, Compounded InfoProvider:
      • If standard query statistics information is available for query and query didn’t use cache only InfoProviders that are specified as partial InfoProvider are processed by next steps.
      • Further checks are performed if 0INFOPROV selection was used in the query is. If yes only child InfoProviders that belong to this selection are processed. Further processing is same as for direct inforpovider query read.

In following example you can find reason why weight avarage is used and not summation for whole InfoProvider. 

Example:

We have InfoProvider: CO_AB that was analyzed using 0CALYEAR characteristic. As result InfoProvider data were distributed into following data groups:

  1. Data with 0CALYEAR = 2017, size = 5 GB, reporting usage = 150 
  2. Data with 0CALYEAR = 2016, size = 15 GB, reporting usage = 85
  3. Data with 0CALYEAR = 2015, size = 18 GB, reporting usage = 60 
  4. Data with 0CALYEAR = 2014, size = 14 GB, reporting usage = 37 
  5. Data with 0CALYEAR <= 2013, size = 563 GB, reporting usage = 2 

In our example total number of queries executed on top of CO_AB InfoProvider was 162. We cannot use sum of reporting usage of data parts as it would give us false value for whole InfoProvider 334. As each query can access 0..N data parts of InfoProvider. Maximum value of data parts usage 150 is also not correct as that partion did not have to be accessed by all queries ( in our case it was not ).

For correct identification of archiving potential of InfoProvider also direct value of 162 is not sufficient as we would assume that data of InfoProvider are accessed quite often. Using weight avarage formula we will get different value (as we take into account also each data part size):

w.a. ( 5 * 150 ) + ( 15 * 85 ) + ( 18 * 60 ) + ( 14 * 37 ) + ( 563 * 2 ) / ( 5 + 15 + 18 + 14 + 563 ) = 7.721 

Value ~7 means that most of the InfoProvider data have really low reporting usage so the potential of this InfoProvider for archiving is high. 

Please note that total InfoProvider reporting usage calculated as weight avarage is not used everywhere. In new Data Distribution analyses weight average is used for calculation od data groups reporting usage (as they can cumulate multiple data parts) while InfoProvider total usage is analyzed separatelly.