(Glue-2102) Advanced Mass Execution Capabilities
Mass execution functionality allows you to define also composite splitting methods. When advanced data split type is used, you can see the following button in the selection screen.
In order to configure advanced mass execution, you need to proceed with the following steps:
Press the Configure button which will display a popup shown in the picture.
Define mass execution types that will be performed during the extraction process execution. Use the provided toolbar to specify the configuration. Following operations are available:
Add data split - Adds new data split to advanced mass execution configuration.
Remove data split - Removes selected data split from advanced mass execution configuration together with parameters already defined for the selected data split.
Define parameters - Allows you to define and change parameters for the selected data split.
When data split types are added to the mass execution configuration, you need to define parameters for each split you wish to include. You can do so by selecting the data split type and pressing the Define parameters button from the toolbar. You can find more information about defining parameters for particular split types in the section Configuring advanced data split types.
After all parameters have been defined, you need to confirm the parameter selection by pressing the Confirm button. In case of pressing the button Cancel, all changes you made will be discarded.
We recommend you save this configuration as a standard SAP variant so you will be able to access the definitions later without the need to configure them again.
During the extraction process execution, all defined split types are executed in the order defined by the split number. In case some data split is defined incorrectly, the whole execution will end with an error status.
Advanced mass execution explained
In this section, we are going to explain how advanced mass execution works. We will be assuming the example that uses the following definition:
As you can see, there are two data split types defined within the configuration. The first one splits the data into portions based on the actual number of records in the source object and the second one splits the data based on the time characteristic. To learn more about split types, follow the chapter Mass Execution for basic types and section Configuring advanced data split types for advanced data split types.
Mass execution uses so-called Composite data split in order to combine data portions that will be generated for each specified data split type. Assuming our example, when the extraction process is being executed, the Split data based on number of rows is executed first as it’s first in the definition. Let’s say the data will be divided into 10. Then the second split, Split data based on time field, will be executed in a way that it will divide data into let’s say 100 portions. Next, the Composite data split logic combines the ranges that define a particular data portion. As a result, data will be split into 1000 (10*100). This combination will split each data portion defined by the first split into the number of portions defined by the second split by adding an additional filter.
Be aware that with each additional split the number of data portions grows like geometric series. Therefore we do not recommend using more than 3 splits.
This recommendation is however dependent on the particular scenario, the number of parallel jobs used, and actual system resources.
Configuring advanced data split types
As mentioned in previous sections, you can define multiple data split types within an advanced configuration. The basic data split types definition is equal to the definition using basic mass execution capabilities with a similar user interface. Therefore, we will not mention them here and you can read about them in the chapter Mass Execution.
Composite data split
Composite data split represents a general wrapper that binds various splits together and is always configured when advanced data split type is selected. Therefore, it is not present as a split type within the Advanced Mass Execution popup and won’t be discussed further. You can find the details in the chapter Mass Execution.
Split data based on CSV defined values
This data split allows you to define data portions in a .csv file and store it on the application server. The data from the application server is read and translated into single value filters that define data portions.
In the next picture you can find the configuration screen:
Partitioning field - Source field based on which the data will be split into portions.
File name - Path to .csv file on application server.
To select the file name, you can simply press F4 and navigate within the tree that will be displayed to your file.
Upload file - This allows you to upload your .csv file to the application server and sets it directly as the value of the File name parameter.
The user needs to have proper authorization (SAP standard) in order to upload a file. Therefore we suggest setting up a separate folder on the application server that will be used for these files only.
During the execution, the user that executes the process also needs to have the authorizations to read the file from the particular path on the application server.
File format
The expected file format is a .csv file with values separated by the character “,“. It also needs to include a header at the first line and all values in particular columns need to be convertible to the actual data type represented by the field in the header. Please check the following example: