(Glue-2408) Automated Data Split Based on the Combination of Variable Values
This partitioning logic splits the source data automatically based on selected query variables. Using this data split allows you to extract all the query data by smaller portions thus avoiding memory overflow issues. This data split is relevant only for BEx Query Fetcher (BICS).
The data split occurs on two levels:
Initial split during the generation of Mass Execution tasks. The values of a variable with the highest cardinality are equally split into N ranges in combination with values from other variables marked in the automated data split screen, where N is is the Number of portions parameter defined in the configuration screen.
Secondary split in a BEx Query Fetcher (BICS) during the extraction. Once the next data package is requested, the fetcher will split the selection by the smallest portions and sequentially execute the query with those selections until the package size is reached.
Example 1: Automated split based on 1 variable:
In the Mass Execution, we have selected one variable for the automated data split: CALDAY_SELOPT (calendar day).
On the Extraction Process main screen, we have specified static filter: two values for the variable /DVD/DM_MATTYPE (material type):
SERVICES
TRADING GOODS
The Number of portions field is set to 2.
The initial split determined that the query contains the following values for the calendar day:
01.01.2005 - 10.01.2005
Next, these values are split into the defined number of portions 2.
Therefore, every single task will contain data for 5(10/2) days as well as the user-specified selection for the material type. The selection looks like this:
Selection for calendar day for the first job:
Selection for material type for the first job:
Similarly, the data will be split for the second mass execution job. The calendar day will be set to between 06.01.2005 and 10.01.2005 and the same selection for Material Type.
The fetcher will then combine every single Calendar Day with the static selection for Material Type (selection specified by user) into 5 portions and will run query till the package size is reached. See the example of generated portions below.
Portion 1.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Portion 2.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Portion 3.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Portion 4.
CALDAY_SELOPT: 04.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Portion 5.
CALDAY_SELOPT: 05.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
In a similar way the data will be processed in the second mass execution job:
Portion 1.
CALDAY_SELOPT: 06.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Portion 2.
CALDAY_SELOPT: 07.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Portion 3.
CALDAY_SELOPT: 08.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Portion 4.
CALDAY_SELOPT: 09.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Portion 5.
CALDAY_SELOPT: 10.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Example 2. Automated split based on two variables:
In the Mass Execution, we have selected two variables for automated data split - CALDAY_SELOPT (calendar day) and /DVD/DM_MATTYPE (material type).
The Number of portions field is set to 5.
The initial split determine that the query contains the following values for the selected variables:
Calendar Day:
01.01.2005 - 10.01.2005
Material Type:
NON-STOCK MATERIAL
OTHER MATERIAL
PACKAGING MATERIAL
SERVICES
TRADING GOODS
Then the variable with the highest cardinality (Calendar Day) was picked as the main partitioning variable and split into the defined number of portions 5.
Therefore, every single task will contain a selection for 2(10/5) days and all values from Material Type. Below you can see how the selection looks like.
Selection for calendar day for the first job:
Selection for material type for the first job:
The fetcher then intersects the selection for both variables into 10 portions and executes the query until the package size is reached. See the example of the generated portions below.
Portion 1.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL
Portion 2.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL
Portion 3.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL
Portion 4.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: SERVICES
Portion 5.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: TRADING GOODS
Portion 6.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL
Portion 7.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL
Portion 8.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL
Portion 9.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: SERVICES
Portion 10.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: TRADING GOODS
Similar way the data will be processed in the rest of the mass execution jobs.
Example 3: Automated split in combination with other data splits:
In case a user wants to extract data for a specific period it might be a good idea to use the automated data split in combination with e.g. “Split data based on CSV defined values” or “Split data based on time field”. In the current example, we will use “Split data based on time field”.
In our case, the query contains data for the period 01.01.2005 - 10.01.2005, but we want to extract only the first 3 days, i.e. from 01.01.2005 to 03.01.2005.
The procedure is the following:
Choose “Combined data split” in mass execution options.
2. Define parameters for the split based on the time field.
3. Define automated data split. In our case, we will select only one variable - Material Type.
The maximum number of jobs is set to 5.
Outcome:
Once the extraction is executed, the first split will generate 3 single value filters for Calendar Day:
01.01.2005
02.01.2005
03.01.2005
The second split will generate 5 single values:
NON-STOCK MATERIAL
OTHER MATERIAL
PACKAGING MATERIAL
SERVICES
TRADING GOODS
Afterward, Mass Execution will intersect those values and generate 15 jobs with the following selections:
Selection 1.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL
Selection 2.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL
Selection 3.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL
Selection 4.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: SERVICES
Selection 5.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: TRADING GOODS
Selection 6.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL
Selection 7.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL
Selection 8.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL
Selection 9.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: SERVICES
Selection 10.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: TRADING GOODS
Selection 11.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL
Selection 12.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL
Selection 13.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL
Selection 14.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: SERVICES
Selection 15.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: TRADING GOODS
In this case, a secondary data split will not happen, because every execution contains only a single value per variable.