(Glue-2305) Automated Data Split Based on the Combination of Variable Values

This partitioning logic splits the source data automatically based on selected query variables. Using this data split allows extracting the whole query data by smaller portions thus avoiding memory overflow issues. This data split is relevant only for BEx Query Fetcher (BICS).

The data split occurs on two levels:

  1. Initial split during generation of Mass Execution tasks. The values of a variable with the highest cardinality are equally split into {n} ranges in combination with values from other variables marked in the automated data split screen, where n is is the Number of portions parameter defined configuration screen.

  2. Secondary split in BEx Query Fetcher (BICS) during the extraction. Once the next data package is requested, the fetcher will split the selection by the smallest portions and sequentially execute the query with those selections till the package size is reached.

In the next picture you can find the configuration screen:

Example 1. Automated split based on 1 variable:

In Mass Execution we selected one variable for automated data split - CALDAY_SELOPT (Calendar Day).

On Extraction Process main screen we specify static filter - two values for variable /DVD/DM_MATTYPE (Material Type):

  • SERVICES

  • TRADING GOODS

Number of portions is set to 2.

The initial split found out that the query contains the following values for the Calendar Day:

01.01.2005 - 10.01.2005

Next up, those values are split into “Number of portions”, i.e. 2.

Therefore, every single task contains data for 5 days and the selection which the user-specified for Material Type. Below you can see how the selection looks like.

Selection for Calendar Day for the first job:

Selection for Material Type for the first job:

Similarly, the data will be split for the second mass execution job. There will be Calendar Day between 06.01.2005 and 10.01.2005 and the same selection for Material Type.

The fetcher will then combine every single Calendar Day with the static selection for Material Type (selection specified by user) into 5 portions and will run query till the package size is reached. See the example of generated portions below.

Portion 1.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Portion 2.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Portion 3.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Portion 4.
CALDAY_SELOPT: 04.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Portion 5.
CALDAY_SELOPT: 05.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

In a similar way the data will be processed in the second mass execution job:

Portion 1.
CALDAY_SELOPT: 06.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Portion 2.
CALDAY_SELOPT: 07.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Portion 3.
CALDAY_SELOPT: 08.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Portion 4.
CALDAY_SELOPT: 09.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Portion 5.
CALDAY_SELOPT: 10.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Example 2. Automated split based on two variables:

In Mass Execution we selected two variables for automated data split - CALDAY_SELOPT (Calendar Day) and /DVD/DM_MATTYPE (Material Type).

Number of portions is set to 5.

The initial split found out that the query contains the following values for the selected variables:

Calendar Day:

01.01.2005 - 10.01.2005

Material Type:

NON-STOCK MATERIAL

OTHER MATERIAL

PACKAGING MATERIAL

SERVICES

TRADING GOODS

Then the variable with the highest cardinality (Calendar Day) was picked as the main partitioning variable and split into “Number of portions”, i.e. 5.

Therefore, every single task will contain a selection for 2 days and all values from Material Type. Below you can see how the selection looks like.

Selection for Calendar Day for the first job:

Selection for Material Type for the first job:

The fetcher will then intersect the selection for both variables into 10 portions and will run the query till the package size is reached. See the example of generated portions below.

Portion 1.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL

Portion 2.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL

Portion 3.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL

Portion 4.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: SERVICES

Portion 5.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: TRADING GOODS

Portion 6.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL

Portion 7.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL

Portion 8.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL

Portion 9.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: SERVICES

Portion 10.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: TRADING GOODS

Similar way the data will be processed in the rest of the mass execution jobs.

Example 3. Automated split in combination with other data splits:

In case a user wants to extract data for a specific period it might be a good idea to use the automated data split in combination with e.g. “Split data based on CSV defined values” or “Split data based on time field”. In the current example, we will use “Split data based on time field”.

In our case, the query contains data for the period 01.01.2005 - 10.01.2005, but we want to extract only the first 3 days, i.e. from 01.01.2005 to 03.01.2005.

The procedure is the following:

  1. Choose “Combined data split” in mass execution options.

2. Define parameters for the split based on the time field.

3. Define automated data split. In our case, we will select only one variable - Material Type.

The maximum number of jobs is set to 5.

Outcome:

Once the extraction is executed, the first split will generate 3 single value filters for Calendar Day:

  • 01.01.2005

  • 02.01.2005

  • 03.01.2005

The second split will generate 5 single values:

  • NON-STOCK MATERIAL

  • OTHER MATERIAL

  • PACKAGING MATERIAL

  • SERVICES

  • TRADING GOODS

Afterward, Mass Execution will intersect those values and generate 15 jobs with the following selections:

Selection 1.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL

Selection 2.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL

Selection 3.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL

Selection 4.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: SERVICES

Selection 5.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: TRADING GOODS

Selection 6.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL

Selection 7.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL

Selection 8.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL

Selection 9.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: SERVICES

Selection 10.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: TRADING GOODS

Selection 11.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL

Selection 12.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL

Selection 13.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL

Selection 14.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: SERVICES

Selection 15.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: TRADING GOODS

In this case, a secondary data split will not happen, as every execution contains only one single value per variable.