(Glue-2102) Mass Execution
The Extraction process of Extractor 2.0 supports mass execution functionality allowing you to execute the process in parallel. Mass execution splits data into multiple portions which are scheduled and executed as separate tasks.
To execute Extractor 2.0 in mass execution mode follow these steps:
Go to the extraction process and press Execute extraction.
As an execution type choose Mass Execution.
After this action, you should see Mass Execution parameters on the screen.
Fill Job configuration. This section specifies the properties of background jobs that will be used for executing the tasks extracting each data portion.
Maximum number of jobs - This parameter specifies the number of background jobs used for extraction.
Application server - Optional input, specifying an application server executing the process. If left empty, the current application server will be used.
Application server group - Optional input, specifying an application server group executing the process. If left empty, the current application server will be used.
Keep alive - When checked, it makes sure that another background job will be opened in case some previous task ended up with an unexpected error. This way the parallelism level is kept on the specified level.
We recommend you always keep this flag checked.
After configuring the jobs, you need to select the Data partitioning type. This parameter defines the logic used for data split into portions. Based on the selected value you can see various parameters selection relevant for the selected execution type. These will be described further.
You can choose from the following data partitioning types:Split data based on number of records
Split data based on size in MB
Split data based on time field
Split data based on fiscal period
Fill parameters required for selected data partitioning logic.
Description of relevant parameters is described in a separate section, describing particular partitioning logic.Press Execute (F8).
Based on the partitioning type, the split of the data might be a time-consuming operation. Therefore you have an option to choose Background execution also in mass execution mode. In the case of data partitioning types Split data based on number of records and Split data based on size in MB we recommend using Background execution.
Important note to selection handling:
After the data split, the mass execution functionality creates an additional selection that is designed to select only a particular data portion. However, this additional selection is just added to the existing selection (if specified) and does not remove the original selection.
Therefore, you should be careful when defining the mass execution because the particular selection can make the mass execution split ineffective.
E.g., when splitting data based on the calendar year and you will provide the range of years as the selection for the calendar year field, this selection will be present in each mass execution task. As a result, all data that matches your manually specified range or the range generated by mass execution will be selected by each task.
Therefore, we recommend you leave the selection of partitioning field empty unless you are executing some specific case e.g., extraction of all data between 2010 and 2020 except the year 2012.
In case your existing selection will conflict with partitioning parameters, a popup below will be displayed.
Split data based on number of records
This logic splits the source data based on the number of records comparing selected field values while trying to match the required row limit.
In the next picture, you can see the selection screen of the Extractor 2.0 process showing parameters relevant for Split data based on number of records.
You need to fill in these parameters:
Partitioning field - Field of the source structure used to split the data into portions.
This field also needs to be present in the selection of the Fetcher used by the extraction process.
Limit - Desired number of rows for single data portion.
Split data based on size in MB
This logic split the source data based on data size in MB comparing selected field values while trying to match the required size limit.
In the next picture, you can see the selection screen of the Extractor 2.0 process showing parameters relevant for Split data based on size in MB.
You need to fill these parameters:
Partitioning field - Field of the source structure used to split the data into portions.
Limit - Desired size in MB for a single data portion.
Split data based on time field
This partitioning logic splits the source data based on the defined time value. Using a time characteristic field, range, and unit definition, a separate task is created for each value within the given range.
In the next picture, you can see the selection screen of the Extractor 2.0 process showing parameters relevant for Split data based on time field.
You need to fill in these parameters:
Time field for partitioning - Field of the source structure used to split the data into portions.
Time shift for start - This value represents the number of time units defining the start of the range relative to the current date.
E.g., if you provide 10 as a value and use Year(s) as an unit, you define your range to start 10 years before the current year.Step size - Number of time units covered by a single range.
Assuming the selection from the image above (step defined as 1 and unit as Year(s)) it means that only values matching a single year will be covered in a single extraction task.Time shift end - This value represents the number of time units defining the end of the range relative to the current date.
E.g., if you provide 5 as a value and use Year(s) as a unit, you define your range to end 5 years before the current year.Unit - Unit for range calculation. Day(s), Month(s), and Year(s) are supported units for the time field-based data split. The time unit is not explicitly labeled on the selection screen but is present as a dropdown list next to previous parameters.
Split data based on fiscal period
This partitioning logic splits the source data based on the defined fiscal year/period value. Using the Fiscal Year or Fiscal Period field of the source structure, the Fiscal Year Variant and the range and unit definition will create a separate task for each fiscal value in a given range.
In the next picture, you can see the selection screen of the Extractor 2.0 process showing parameters relevant for Split data based on fiscal period.
You need to fill in these parameters:
Time field for partitioning - Field of the source structure used to split the data into portions.
Time shift for start - This value represents the number of time units defining the start of the range relative to the current date.
E.g., if you provide 10 as a value and use Posting Period(s) as an unit, you define your range to start 10 posting periods before the current year.Fiscal year variant - Fiscal year variant used for fiscal period definition. You can use the provided F4 help to select one of the Fiscal year variants available in the system.
Step size - Number of time units covered by a single range.
Assuming the selection from the image above (step defined as 1 and unit as Posting Period(s)), it means that only values matching a single posting period will be covered in a single extraction task.Time shift end - This value represents the number of time units defining the end of the range relative to the current date.
E.g., if you provide 5 as a value and use Posting Period(s) as an unit, you define your range to end 5 posting periods before the current date.Unit - Unit for range calculation. Posting Period(s) and Fiscal Year(s) are supported units for the fiscal period based data split. The unit is not explicitly labeled on the selection screen but is present as a dropdown list next to previous parameters.
Advanced mass execution capabilities
Mass execution functionality allows you to define also composite splitting methods. When an advanced data split type is used, you can see the following button on the selection screen.
For a complete guide on how to set up advanced mass execution and understand this functionality, please see this chapter https://datavard.atlassian.net/wiki/spaces/GLUE/pages/1950154894.