(Glue-2002) Mass Execution

Extraction process of Extractor 2.0 supports mass execution functionality allowing you to execute the process in parallel. Mass execution splits data into multiple portions which are scheduled and executed as separate tasks.

To execute Extractor 2.0 in mass execution mode follow these steps:

  1. Go to extraction process and press Execute extraction.

     

  2. As an execution type choose Mass Execution.

    After this action you should see Mass Execution parameters on the screen.

     

  3. Fill Job configuration. This section specifies properties of background jobs that will be used for executing the tasks extracting of each data portion.

    1. Maximum number of jobs - This parameter specifies the number of background jobs used for extraction.

    2. Application server - Optional input, specifying an application server executing the process. If left empty, current application server will be used.

    3. Keep alive - When checked, it makes sure that another background job will be opened in case some previous task ended up with unexpected error. This way the parallelism level is kept on the specified level.
      We recommend you to always keep this flag checked.

  4. After configuring the jobs, you need to select Data partitioning type. This parameter defines the logic used for data split into portions. Based on selected value you can see various parameters selection relevant for selected execution type. These will be described further.
    You can choose from following data partitioning types:

    1. Split data based on number of records

    2. Split data based on size in MB

    3. Split data based on time field

    4. Split data based on fiscal period

  5. Fill parameters required for selected data partitioning logic.
    Description of relevant parameters is described in separate section, describing particular partitioning logic.

  6. Press Execute (F8).

     

Based on the partitioning type, the split of the data might be time consuming operation. Therefore you have an option to choose Background execution also in mass execution mode. In case of data partitioning types Split data based on number of records and Split data based on size in MB we recommend to use Background execution.

Important note to selection handling:

After the data split, the mass execution functionality creates additional selection that is designed to select only particular data portion. However, this additional selection is just added to existing selection (if specified) and does not remove the original selection.

Therefore you should be careful when defining the mass execution because the particular selection can make the mass execution split ineffective.
E.g when splitting data based on calendar year and you will provide the range of years as the selection for calendar year field, this selection will be present in each mass execution task. In result all data that matches your manually specified range or the range generated by mass execution will be selected by each task.

Therefore we recommend you to leave the selection of partitioning field empty unless you are executing some specific case e.g. extraction of all data between 2010 and 2020 except the year 2012.

Split data based on number of records

This logic split the source data based on number of records comparing selected field values while trying to match the required row limit.

In the next picture you can see the selection screen of Extractor 2.0 process showing parameters relevant for Split data based on number of records.

You need to fill these parameters:

  • Partitioning field - Field of the source structure used to split the data into portions.

This field also needs to be present in the selection of the fetcher used by extraction process.

  • Limit - Desired number of rows for single data portion.

Split data based on size in MB

This logic split the source data based on data size in MB comparing selected field values while trying to match the required size limit.

In the next picture you can see the selection screen of Extractor 2.0 process showing parameters relevant for Split data based on size in MB.

You need to fill these parameters:

  • Partitioning field - Field of source structure used to split the data into portions.

  • Limit - Desired size in MB for single data portion.

Split data based on time field

This partitioning logic splits the source data based on defined time value. Using time characteristic field, range and unit definition, separate task is created for each value within the given range.

In the next picture you can see the selection screen of Extractor 2.0 process showing parameters relevant for Split data based on time field.

You need to fill these parameters:

  • Time field for partitioning - Field of the source structure used to split the data into portions.

  • Time shift for start - This value represents the number of time units defining the start of the range relatively to current date.
    E.g. if you provide 10 as a value and use Year(s) as unit, you define your range to start 10 years before current year.

  • Step size - Number of time units covered by single range.
    Assuming the selection from the image above (step defined as 1 and unit as Year(s)) it means that only values matching single year will be covered in single extraction task.

  • Time shift end - This value represents the number of time units defining the end of the range relatively to current date.
    E.g. if you provide 5 as value and use Year(s) as unit, you define your range to end 5 years before current year.

  • Unit - Unit for range calculation. Day(s), Month(s) and Year(s) are supported units for the time field based data split. The time unit is not explicitly labeled on selection screen but is present as dropdown list next to previous parameters.

Split data based on fiscal period

This partitioning logic splits the source data based on defined fiscal year/period value. Using Fiscal Year or Fiscal Period field of the source structure, the Fiscal Year Variant and the range and unit definition, will create a separate task for each fiscal value in given range.

In the next picture you can see the selection screen of Extractor 2.0 process showing parameters relevant for Split data based on fiscal period.

You need to fill these parameters:

  • Time field for partitioning - Field of the source structure used to split the data into portions.

  • Time shift for start - This value represents the number of time units defining the start of the range relatively to current date.
    E.g. if you provide 10 as value and use Posting Period(s) as unit, you define your range to start 10 posting periods before current year.

  • Fiscal year variant - Fiscal year variant used for fiscal period definition. You can use provided F4 help to select one of the Fiscal year variants available in the system.

  • Step size - Number of time units covered by single range.
    Assuming the selection from image above (step defined as 1 and unit as Posting Period(s)), it means that only values matching single posting period will be covered in single extraction task.

  • Time shift end - This value represents number of time units defining the end of the range relatively to current date.
    E.g. if you provide 5 as value and use Posting Period(s) as unit, you define your range to end 5 posting periods before current date.

  • Unit - Unit for range calculation. Posting Period(s) and Fiscal Year(s) are supported units for fiscal period based data split. The unit is not explicitly labeled on selection screen but is present as dropdown list next to previous parameters.