(Glue-2305) Automated Data Split Based on the Combination of Characteristic Values

This partitioning logic splits the source data automatically based on selected InfoProvider characteristics. This data split allows extracting the whole InfoProvider data in smaller portions thus avoiding memory overflow issues. This data split is relevant only for ListCube Fetcher.

The data split occurs on two levels:

  1. Initial split during generation of Mass Execution tasks. The values of a characteristic with the highest cardinality are equally split into {n} ranges in combination with values from other characteristics marked in the automated data split screen, where n is the Number of portions parameter defined configuration screen:

  2. Secondary split in ListCube Fetcher during the extraction. Once the next data package is requested, the fetcher will split the selection into the smallest portions and sequentially get data from InfoProvider with those selections till the package size is reached.

Example 1. Automated split based on 1 characteristic:

In Mass Execution we selected one characteristic for automated data split - 0CALDAY (Calendar Day) and set the Number of portions.

If the configuration screen is set, we can execute the extraction:

We created ten tasks (Number of portions) and the maximum number of parallel jobs is three (this is a parameter defined in the extraction process Maximum number of jobs).

In Monitor, we can see ten background jobs and in the selection, we can see how data was divided based on 0CALDAY (Calendar Day).

The first portion starts on 01.01.2005 (this is the lowest value found) and the last portion is on 03.03.2006.

The second portion starts where the first portion ends and so on.

Example 2. Automated split based on 2 characteristics:

This works similarly to when we split on one characteristic, but only the characteristic with the highest cardinality will be split.

Navigational attributes are not supported.

Do not use the same field in the selection filter and in the split.