(Glue-2408) Automated Data Split Based on the Combination of Characteristic Values

This partitioning logic splits the source data automatically based on selected InfoProvider characteristics. This data split allows you to extract all the InfoProvider data in smaller portions and therefore avoid memory overflow issues. This data split is relevant only for ListCube Fetchers.

The data split occurs on two levels:

Initial split during the generation of Mass Execution tasks. The values of a characteristic with the highest cardinality are equally split into N ranges in combination with values from other characteristics marked in the automated data split screen, where N is the Number of portions parameter defined configuration screen:

Secondary split in a ListCube Fetcher during the extraction. Once the next data package is requested, the fetcher will split the selection into the smallest portions and sequentially retrieve data from the InfoProvider with those selections until the package size is reached.

Example 1: Automated split based on one characteristic:

In the Mass Execution, we have selected one characteristic for the automated data split - 0CALDAY (calendar day) and defined the Number of portions parameter.

If the configuration screen is set, we can execute the extraction:

We have created ten tasks (Number of portions) and the maximum number of parallel jobs is three (this is a parameter defined in the extraction process parameter Maximum number of jobs).

In the monitor, we can see ten background jobs in the selection, we can see how data was divided based on 0CALDAY (calendar day).

The first portion starts on 01.01.2005, which is the lowest value found and the last portion is on 03.03.2006.

The second portion starts where the first portion ends and so on.

Example 2: Automated split based on two characteristics:

This use case works similarly to when we split according to one characteristic, but only the characteristic with the highest cardinality will be split.

Navigational attributes are not supported.

Do not use the same field in the selection filter and in the split.