(Glue-2305) Automated Data Split Based on the Combination of Variable Values
This partitioning logic splits the source data automatically based on selected query variables. Using this data split allows extracting the whole query data by smaller portions thus avoiding memory overflow issues. This data split is relevant only for BEx Query Fetcher (BICS).
The data split occurs on two levels:
Initial split during generation of Mass Execution tasks. The values of a variable with the highest cardinality are equally split into {n} ranges in combination with values from other variables marked in the automated data split screen, where n is is the Number of portions parameter defined configuration screen.
Secondary split in BEx Query Fetcher (BICS) during the extraction. Once the next data package is requested, the fetcher will split the selection by the smallest portions and sequentially execute the query with those selections till the package size is reached.
In the next picture you can find the configuration screen:
Example 1. Automated split based on 1 variable:
In Mass Execution we selected one variable for automated data split - CALDAY_SELOPT (Calendar Day).
On Extraction Process main screen we specify static filter - two values for variable /DVD/DM_MATTYPE (Material Type):
SERVICES
TRADING GOODS
Number of portions is set to 2.
The initial split found out that the query contains the following values for the Calendar Day:
01.01.2005 - 10.01.2005
Next up, those values are split into “Number of portions”, i.e. 2.
Therefore, every single task contains data for 5 days and the selection which the user-specified for Material Type. Below you can see how the selection looks like.
Selection for Calendar Day for the first job:
Selection for Material Type for the first job:
Similarly, the data will be split for the second mass execution job. There will be Calendar Day between 06.01.2005 and 10.01.2005 and the same selection for Material Type.
The fetcher will then combine every single Calendar Day with the static selection for Material Type (selection specified by user) into 5 portions and will run query till the package size is reached. See the example of generated portions below.
Portion 1.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Portion 2.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Portion 3.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Portion 4.
CALDAY_SELOPT: 04.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Portion 5.
CALDAY_SELOPT: 05.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
In a similar way the data will be processed in the second mass execution job:
Portion 1.
CALDAY_SELOPT: 06.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Portion 2.
CALDAY_SELOPT: 07.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Portion 3.
CALDAY_SELOPT: 08.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Portion 4.
CALDAY_SELOPT: 09.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Portion 5.
CALDAY_SELOPT: 10.01.2005
/DVD/DM_MATTYPE: SERVICES and TRADING GOODS
Example 2. Automated split based on two variables:
In Mass Execution we selected two variables for automated data split - CALDAY_SELOPT (Calendar Day) and /DVD/DM_MATTYPE (Material Type).
Number of portions is set to 5.
The initial split found out that the query contains the following values for the selected variables:
Calendar Day:
01.01.2005 - 10.01.2005
Material Type:
NON-STOCK MATERIAL
OTHER MATERIAL
PACKAGING MATERIAL
SERVICES
TRADING GOODS
Then the variable with the highest cardinality (Calendar Day) was picked as the main partitioning variable and split into “Number of portions”, i.e. 5.
Therefore, every single task will contain a selection for 2 days and all values from Material Type. Below you can see how the selection looks like.
Selection for Calendar Day for the first job:
Selection for Material Type for the first job:
The fetcher will then intersect the selection for both variables into 10 portions and will run the query till the package size is reached. See the example of generated portions below.
Portion 1.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL
Portion 2.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL
Portion 3.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL
Portion 4.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: SERVICES
Portion 5.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: TRADING GOODS
Portion 6.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL
Portion 7.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL
Portion 8.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL
Portion 9.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: SERVICES
Portion 10.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: TRADING GOODS
Similar way the data will be processed in the rest of the mass execution jobs.
Example 3. Automated split in combination with other data splits:
In case a user wants to extract data for a specific period it might be a good idea to use the automated data split in combination with e.g. “Split data based on CSV defined values” or “Split data based on time field”. In the current example, we will use “Split data based on time field”.
In our case, the query contains data for the period 01.01.2005 - 10.01.2005, but we want to extract only the first 3 days, i.e. from 01.01.2005 to 03.01.2005.
The procedure is the following:
Choose “Combined data split” in mass execution options.
2. Define parameters for the split based on the time field.
3. Define automated data split. In our case, we will select only one variable - Material Type.
The maximum number of jobs is set to 5.
Outcome:
Once the extraction is executed, the first split will generate 3 single value filters for Calendar Day:
01.01.2005
02.01.2005
03.01.2005
The second split will generate 5 single values:
NON-STOCK MATERIAL
OTHER MATERIAL
PACKAGING MATERIAL
SERVICES
TRADING GOODS
Afterward, Mass Execution will intersect those values and generate 15 jobs with the following selections:
Selection 1.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL
Selection 2.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL
Selection 3.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL
Selection 4.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: SERVICES
Selection 5.
CALDAY_SELOPT: 01.01.2005
/DVD/DM_MATTYPE: TRADING GOODS
Selection 6.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL
Selection 7.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL
Selection 8.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL
Selection 9.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: SERVICES
Selection 10.
CALDAY_SELOPT: 02.01.2005
/DVD/DM_MATTYPE: TRADING GOODS
Selection 11.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: NON-STOCK MATERIAL
Selection 12.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: OTHER MATERIAL
Selection 13.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: PACKAGING MATERIAL
Selection 14.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: SERVICES
Selection 15.
CALDAY_SELOPT: 03.01.2005
/DVD/DM_MATTYPE: TRADING GOODS
In this case, a secondary data split will not happen, as every execution contains only one single value per variable.