(Glue-2405) Advanced Mass Execution Capabilities

The Mass Execution functionality offers advanced split types, including combined data split, CSV-defined values, ABAP code, and automated data split. You can see the following button in the selection screen when advanced data split type is used.

Configuring advanced data split types

Automated data split

The type of automated data split differs based on the fetcher used in the extraction process. There are three types of automated data split: Based on the size in MB or Rows amount, Based on the Combination of Variable Values (read more), and Based on the Combination of Characteristic Values (read more).

In the image below, you can find the configuration screen for automated data Split Based on size in MB or Rows amount:

In the case of cluster tables, there is a checkbox Guarantee package size. Data is split from the cluster, where could also be the records from other cluster tables. This can cause the possibility to make a portion bigger than expected but in a short time. When the checkbox is enabled, portions are made with a deeper searching algorithm, which does not exceed a selected package size.

In the image below, you can find the configuration screen for Automated Data Split Based on the Combination of Variable Values:

In the image below, you can find the configuration screen for Automated Data Split Based on the Combination of Characteristic Values:

ABAP code

This data split allows you to write ABAP code based on which data splits will be made.

In the image below, you can find the configuration screen displayed after pressing the Configure button:

Split data based on CSV-defined values

This data split allows you to define data portions in a .csv file and store it on the application server. The data from the application server is read and translated into single-value filters that define data portions.

In the image below, you can find the configuration screen, which is displayed after pressing the Configure button:

Partitioning field - Source field based on which the data will be split into portions.
File name - Path to a .csv file on the application server.
- To select the file name, you can simply press F4 and navigate within the tree that will be displayed in your file.
Upload file - This allows you to upload your .csv file to the application server and sets it directly as the value of the File name parameter.

The user needs to have proper authorization (SAP standard) to upload a file. Therefore we suggest setting up a separate folder on the application server that will be used for these files only.

During the execution, the user that executes the process also needs to have the authorization to read the file from the particular path on the application server.

File format

The expected file format is a .csv file with values separated by the character “,“. It also needs to include a header at the first line and all values in particular columns need to be convertible to the actual data type represented by the field in the header. Please check the following example:

File structure example values within [] should not be present in an actual file:

/DV1/S_DMMAT [Header in the first line]
MATERIAL1
MATERIAL2
MATERIAL3
MATERIAL4

Combined data split

Within a configuration of combined data split, you can define multiple data split types. The definition of basic data split types is identical to using basic mass execution capabilities, with a similar user interface. Therefore, we will not detail them here and you can find more information in the chapter Mass Execution.

Combined data split represents a general wrapper that binds various split types together. Therefore, it is not present as a separate split type within the Advanced Mass Execution popup.

To configure the combined data split, you need to proceed with the following steps:

Press the Configure button which will display a popup shown in the picture.
Define mass execution types that will be performed during the extraction process execution. Use the provided toolbar to specify the configuration. The following operations are available:
1. Add data split: Adds new data split to advanced mass execution configuration.
2. Remove data split: Removes selected data split from advanced mass execution configuration together with parameters already defined for the selected data split.
3. Define parameters: Allows you to define and change parameters for the selected data split.
When data split types are added to the mass execution configuration, you need to define parameters for each split you wish to include. You can do so by selecting the data split type and pressing the Define parameters button from the toolbar. For additional details on defining parameters for basic split types, please refer to the Mass Execution chapter. For advanced split types, you can find guidance in the subsequent sections of this chapter.
Once all parameters have been defined, you need to confirm the parameter selection by pressing the Confirm button. In case of pressing the button Cancel, all changes you made will be discarded.
We recommend you save this configuration as a standard SAP variant so you will be able to access the definitions later without the need to configure them again.

During the extraction process execution, all defined split types are executed in the order defined by the split number. In case some data split is defined incorrectly, the whole execution will end with an error status.

Combined data split explained

In this section, we are going to explain how combined data split works. We will be assuming the example that uses the following definition:

As you can see, there are two data split types defined within the configuration. The first one splits the data into portions based on the actual number of records in the source object and the second one splits the data based on the time characteristic. To learn more about split types, see the chapter Mass Execution for basic types and the section Configuring advanced data split types for advanced data split types.

Mass execution uses the so-called Combined data split to combine data portions that will be generated for each specified data split type. Assuming our example, when the extraction process is being executed, the Split data based on the number of rows is executed first as it’s first in the definition. Let’s say the data will be divided into 10 portions. Then the second split, Split data based on time field, will be executed in a way that it will divide data into let’s say 100 portions. Next, the Combined data split logic combines the ranges that define a particular data portion. As a result, data will be split into 1000 (10*100) portions. This combination will split each data portion defined by the first split into the number of portions defined by the second split by adding a filter.

Be aware that with each additional split, the number of data portions grows like a geometric series. Therefore we do not recommend using more than 3 splits.

This recommendation is however dependent on the particular scenario, the number of parallel jobs used, and actual system resources.