(Glue-2405) Automated Data Split Based on the Combination of Variable Values

This partitioning logic splits the source data automatically based on selected query variables. Using this data split allows you to extract all the query data by smaller portions thus avoiding memory overflow issues. This data split is relevant only for BEx Query Fetcher (BICS).

The data split occurs on two levels:

  1. Initial split during the generation of Mass Execution tasks. The values of a variable with the highest cardinality are equally split into N ranges in combination with values from other variables marked in the automated data split screen, where N is is the Number of portions parameter defined in the configuration screen.

  2. Secondary split in a BEx Query Fetcher (BICS) during the extraction. Once the next data package is requested, the fetcher will split the selection by the smallest portions and sequentially execute the query with those selections until the package size is reached.

Example 1: Automated split based on 1 variable:

In the Mass Execution, we have selected one variable for the automated data split: CALDAY_SELOPT (calendar day).

On the Extraction Process main screen, we have specified static filter: two values for the variable /DVD/DM_MATTYPE (material type):

  • SERVICES

  • TRADING GOODS

The Number of portions field is set to 2.

The initial split determined that the query contains the following values for the calendar day:

  • 01.01.2005 - 10.01.2005

Next, these values are split into the defined number of portions 2.

Therefore, every single task will contain data for 5(10/2) days as well as the user-specified selection for the material type. The selection looks like this:

Selection for calendar day for the first job:

Selection for material type for the first job:

Similarly, the data will be split for the second mass execution job. The calendar day will be set to between 06.01.2005 and 10.01.2005 and the same selection for Material Type.

The fetcher will then combine every single Calendar Day with the static selection for Material Type (selection specified by user) into 5 portions and will run query till the package size is reached. See the example of generated portions below.

Portion 1.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Portion 2.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Portion 3.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Portion 4.
CALDAY_SELOPT: 04.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Portion 5.
CALDAY_SELOPT: 05.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

In a similar way the data will be processed in the second mass execution job:

Portion 1.
CALDAY_SELOPT: 06.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Portion 2.
CALDAY_SELOPT: 07.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Portion 3.
CALDAY_SELOPT: 08.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Portion 4.
CALDAY_SELOPT: 09.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Portion 5.
CALDAY_SELOPT: 10.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS

Example 2. Automated split based on two variables:

In the Mass Execution, we have selected two variables for automated data split - CALDAY_SELOPT (calendar day) and /DVD/DM_MATTYPE (material type).

The Number of portions field is set to 5.

The initial split determine that the query contains the following values for the selected variables:

Calendar Day:

01.01.2005 - 10.01.2005

Material Type:

NON-STOCK MATERIAL

OTHER MATERIAL

PACKAGING MATERIAL

SERVICES

TRADING GOODS

Then the variable with the highest cardinality (Calendar Day) was picked as the main partitioning variable and split into the defined number of portions 5.

Therefore, every single task will contain a selection for 2(10/5) days and all values from Material Type. Below you can see how the selection looks like.

Selection for calendar day for the first job:

Selection for material type for the first job:

The fetcher then intersects the selection for both variables into 10 portions and executes the query until the package size is reached. See the example of the generated portions below.

Portion 1.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL

Portion 2.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL

Portion 3.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL

Portion 4.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: SERVICES

Portion 5.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: TRADING GOODS

Portion 6.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL

Portion 7.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL

Portion 8.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL

Portion 9.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: SERVICES

Portion 10.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: TRADING GOODS

Similar way the data will be processed in the rest of the mass execution jobs.

Example 3: Automated split in combination with other data splits:

In case a user wants to extract data for a specific period it might be a good idea to use the automated data split in combination with e.g. “Split data based on CSV defined values” or “Split data based on time field”. In the current example, we will use “Split data based on time field”.

In our case, the query contains data for the period 01.01.2005 - 10.01.2005, but we want to extract only the first 3 days, i.e. from 01.01.2005 to 03.01.2005.

The procedure is the following:

  1. Choose “Combined data split” in mass execution options.

2. Define parameters for the split based on the time field.

3. Define automated data split. In our case, we will select only one variable - Material Type.

The maximum number of jobs is set to 5.

Outcome:

Once the extraction is executed, the first split will generate 3 single value filters for Calendar Day:

  • 01.01.2005

  • 02.01.2005

  • 03.01.2005

The second split will generate 5 single values:

  • NON-STOCK MATERIAL

  • OTHER MATERIAL

  • PACKAGING MATERIAL

  • SERVICES

  • TRADING GOODS

Afterward, Mass Execution will intersect those values and generate 15 jobs with the following selections:

Selection 1.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL

Selection 2.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL

Selection 3.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL

Selection 4.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: SERVICES

Selection 5.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: TRADING GOODS

Selection 6.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL

Selection 7.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL

Selection 8.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL

Selection 9.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: SERVICES

Selection 10.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: TRADING GOODS

Selection 11.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL

Selection 12.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL

Selection 13.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL

Selection 14.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: SERVICES

Selection 15.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: TRADING GOODS

In this case, a secondary data split will not happen, because every execution contains only a single value per variable.