(Glue-2111) Create a Glue table on File storage
The creation of the Glue table requires a few additional steps in comparison with creating a table on transparent storage. You can find the common procedure in the chapter Create a Glue Table and during the selection of the Storage ID you need to follow these steps:
Fill in a Storage ID with a valid File storage ID. You should see the File storage options button. You might need to press enter (after filling the storage ID) in order to see this button.
Press the File storage options button. You should see the following popup
This is the default state and if confirmed the logic of the Profile mapping will be used.
If the second option from the radio buttons Specific profile is picked, the popup will change accordingly:There is a need to use an existing Profile name that will be used, for that the F4 help can be used.
When the last, third option from the radio buttons Custom options is picked, the popup will change accordingly:Only in this case, the next (third) step is necessary.
Fill in the parameters based on your requirements or leave the default one, if suitable.
Container name – Directory where files generated by Glue are stored (obligatory parameter).
You can also specify a path to the container by using a slash sign as a separator between the containers.
There’s also the option of using Glue placeholders in the container name.
File type – File format to store the data (CSV or PARQUET)
CSV options:
Delimiter type – Character which separates values in generated .csv files. This character can be selected from F4 help (obligatory parameter).
Put values into quotes – Option for putting values into the quotes (use when there is a delimiter character present in data).
Use extending escaping (optional) – Replaces escape characters, such as newline, backspace, tabulator, etc., by the 'space' character. This can be customized in the table /DVD/RL_EXT_ESC and a default set of values can be filled by transaction /DVD/RL_DEFAULT_ESC.
Include header (optional) – Includes a header line at the beginning of the file with data.
One file extraction (optional) – Keeps all data in one file (data from one extraction is stored in only one file).
Compress file(optional) – Compresses the file into a gzip. This feature cannot be selected if the parameter One file extraction is enabled and vice versa.
Compression level – Compression level of the gzip file. Available values can be selected from F4 help. The default value is 5. The parameter will appear only if the Compress file option is checked.
PARQUET options
Create Delta Lake metadata - metadata JSON files are created in the “_delta_log” subfolder to allow Delta Lake Spark processing
Confirm your selection. Now you can continue with Step 2 of Create a Glue table procedure.
Glue table on File storage explained
On file-based storages, no structures like database tables are created and data is stored within a specific location in .csv files. To keep track of Glue tables created, the .json metadata file is generated at a specific location (defined in Container name) during Glue table activation. A similar metadata file is also transferred in case an alter operation is performed on the Glue table. The naming convention for the Glue table name can be customized via Glue Settings - Binary Storage settings parameter called Binary table metadata file name.
JSON metadata structure
The metadata for each Glue table created on file storage is stored in the JSON file and contains the technical description of the Glue table. This file can be used by an external processing application to process the Glue table correctly.
In the next section, we discuss the JSON metadata structure and the meaning of particular attributes.
Example file content
{
"Table" :{
"TableName" :"ZGLUE_TABLE",
"Description" :"Example Glue Table",
"DevClass" :"ZMS_TEST",
"StorageID" :"AWS_Q1",
"Operation" :"CREATE"
},
"Fields" :[
{
"FieldName" :"CALMONTH",
"Position" :"0002",
"KeyFlag" :"X",
"RollName" :"/BI0/OICALMONTH",
"DataType" :"NUMC",
"Length" :"000006",
"Decimals" :"000000",
"Domain" :"",
"Description" :"Calendar year/month",
"Partitioning" :"",
"ReferenceFieldName" :""
}
]
}
Attributes description
The whole table metadata is encapsulated into a single JSON object that consists of two main attributes named “Table“ and “Fields“.
Attribute “Table“ contains general information about the table, that are represented as standard JSON attributes in “name“ : ”value” format these attributes are:
“TableName“ - The name of the Glue table that was provided during the Glue table creation process
“Description“ - Description assigned to Glue table during Glue table creation process
“DevClass“ - Development class (package) in SAP, where the Glue table was created
“StorageID“ - Technical ID of the storage used for Glue table creation
“Operation“ - Action performed on the Glue table. Value CREATE indicates that table was created, value ALTER indicates that changes were made on the active table
Attribute “Fields“ contains information about all fields defined within the Glue table. The information is represented by an array of JSON objects, where each contains the following information:
“FieldName“ - Technical name of a field defined within the Glue table
“Position“ - Position of the field within Glue table definition. Also represents the position of the field (column) within the CSV data file
“KeyFlag“ - Indicates whether the field was defined as a key field. Value “X“ indicates that the field is a key field, the value ““ indicates the non-key field
“RollName“ - Name of the data element defined within SAP DDIC that is assigned to the field of the Glue table
“DataType“ - ABAP Dictionary data type of the field
“Length“ - Length/Character length of the field in the ABAP dictionary
“Decimals“ - Number of decimal places
“Domain“ - Name of the ABAP dictionary domain
“Description“ - Meaningful description of the field provided by user/DDIC data element
“Partitioning“ - Indicates whether the field is used as a partitioning field or not. “X“ - True, ““ - False
“ReferencedFieldName“ - Referenced field name used for Data Enrichment functionality.