(Glue-2102) Extraction to File storage

The extraction from SAP object to File storage has a lot in common with standard https://datavard.atlassian.net/wiki/spaces/GLUE/pages/1465843839 extraction. The same steps are required in order to execute the extraction. However, there are some differences in the way how the data and metadata are stored on file storage. On this page, we are going to focus on these differences. In order to set up the extraction, you can follow the steps described on pagehttps://datavard.atlassian.net/wiki/spaces/GLUE/pages/1465843839.

Glue tables on file storage

As in the case of the standard SAP table to Storage scenario, the creation of the Glue table is required before the extraction objects can be created. However, because the storage is file-based, there is no possibility to create a database table-like object in order to store data. Therefore, only metadata in JSON format are stored on the target storage during the Glue table activation, describing the Glue table. The more information about the creation of a Glue table on file storage, please see the page https://datavard.atlassian.net/wiki/spaces/GLUE/pages/2120319157. You can also find more information about Glue table metadata in the section JSON metadata structure.

Extraction

During the extraction process itself, two types of files are created, These are data files and metadata files for extraction (Glue request). The creation of the files is dependent on the settings defined within File storage options. In the next section, we are going to discuss the particular files and how they can be configured,

Data files

Default configuration

In the default configuration, the data file is created as a CSV file for each transferred package. Each data file has an iteration (package) number appended to its name and is stored in the destination defined by File storage options within the Glue table. The data file naming can be configured using Glue Settings - Binary Storage settings using parameter Binary data file name.

One file extraction enabled

In case One file extraction is enabled within a Glue table, the iteration (package) number is not included within the filename, resulting in the extraction into one file. However, two different extractions will create two files, depending on Glue Settings - Binary Storage settings. Also, this option cannot be combined with the Compress file.

File compression enabled

If you use Compress file options, the data files will be compressed during extraction before transferred to the target storage. The structure of the file will remain in CSV format. However, this file will be compressed and stored as a GZIP file on target storage.

Handling of empty data files

In the default configuration, if extraction reads 0 lines from the source object (e.g. empty delta load), no data file is transferred to the target storage. However, it is possible to customize this behavior using Glue Settings - Binary Storage settings using parameter Generate CSV data file for empty load. If empty data files are enabled, even an empty load will create a data file on target storage, respecting other settings of the Glue table (e.g. Include table header).

Metadata files

Similar to Glue tables, each extraction to file storage generates a metadata file in JSON format, describing the extraction in technical details.

JSON Example

{ "GlueRequest" :{ "Request" :"116123", "Status" :"S", "GlueTable" :"ZGLUE_TABLE", "ContainerPath" :"", "Extractor" :"ZGLUE_TABLE_P", "UserName" :"MSIMASEK", "StartTime" :"20200715135945.0694020", "FinishTime" :"20200715135948.2475040", "LinesRead" :"1000", "LinesTransferred" :"1000", "JobName" :"ZDVD_GLUEZGLUE_TABLE_P", "JobCount" :"15594400", "DeltaType" :"FULL", "LoadType" :"F" } }

Attributes explanation

The extraction metadata are encapsulated within a single JSON object with a single attribute called “GlueRequest“. This named attribute is a JSON object as well and contains the following information about Glue extraction:

  • “Request“ - Number of the Glue request associated with this extraction

  • “Status“ - Single character status of extraction

    • “E” - Ended with error

    • “S” - Ended Successfully

    • “F” - Failed

  • “GlueTable“ - Name of the target Glue table

  • “ContainerPath“ - Path to container (location) on file storage

  • "Extractor" - Technical name of the Glue extraction process that is used for the extraction

  • “UserName“ - SAP user that executed the extraction

  • "StartTime" - Timestamp when extraction started

  • "FinishTime" - Timestamp when extraction finished

  • "LinesRead" - Number of records that were read from the source object

  • "LinesTransferred" - Number of records transferred to target storage

  • "JobName" - Technical name of an SAP background job that processed the extraction

  • "JobCount" - Numeric identifier of an SAP background job that processed the extraction

  • “DeltaType“ - Delta used for reading the data from the source object

  • "LoadType" - Single character identificator of extraction load type

    • “I” - Delta Init Without Data

    • “L” - Delta Init + Full Load

    • “D” - Delta Load

    • “F” - Full Load (Repair) Without Delta Update

Metadata files configuration

With the default configuration, one metadata file is created for each Glue request transferred to storage. Using the default configuration, the metadata JSON files are stored within the same location as defined in File storage options.

The name of the metadata files is defined and can be configured via Glue Settings - Binary storage settings using parameter Binary request metadata file name.

Also, the generation of the JSON metadata files for empty loads can be switched on/off via Glue Settings - Binary storage settings using parameter Generate JSON metadata file for empty load. If set on (default value = “X“) the JSON metadata will be generated and transferred to target storage even for extraction, that read 0 records from the source object (e.g. empty delta load). If switched off, no metadata will be generated and transferred for such loads.