(Glue-2202) Merge data replicated in file format (handle delta)

Chapter Replication to files explained which files are written to file-based storage:

Schema metadata (“_S.json” files) - information about the schema like columns, data types, keys, etc.
Request metadata (“_M.json” files) - information about a specific request (or load) like delta type, load type, or volume
Data (csv | parquet)

You can edit the naming convention with SNP Glue™ Profile Settings.

Within this chapter, we’ll explain how to use the provided metadata to recognize delta’s and do proper merges (for example in spark-based applications like Databricks).

Schema metadata - KeyFields

To build your merge condition you’ve to know the columns which are key fields. Data files like CSV or parquet do not hold this information. Therefore you have to read the schema metadata file and filter for the fields where “KeyFlag“ value is “X“.

More information can be found in the chapter (Glue-2202) Create SNP Glue™ table on File storage

Request metadata - LoadType

The property "LoadType" is a single character that identifies you the extraction load type:

“I” - Delta Init Without Data
“L” - Delta Init + Full Load
“D” - Delta Load
“F” - Full Load (Repair) Without Delta Update
“R“ - Recovery of Previous Deltas - https://datavard.atlassian.net/l/c/X1UGkpNh only

More details you can find here https://datavard.atlassian.net/l/c/7FUaNemb

Request metadata - DeltaType

The property “DeltaType“ tells you which delta mechanism was used for reading the data from the source object. In the table below you can see which DeltaTypes you can expect for the used SAP source object:

Objects
SAP Object	SAP Table	Table View / Hana View	CDS View	Extractor	BW Object	DSO	aDSO	BEx Query
SNP Glue™ Fetcher	SAP Table Fetcher	SAP Table Fetcher Hana View Fetcher	SAP Table Fetcher ODP Fetcher	ODP Fetcher	Listcube Fetcher ODP Fetcher	ODP Fetcher DSO Fetcher	ODP Fetcher aDSO Fetcher	BEx Query Fetcher
DeltaType									Indication field (type)	Load type	Insert	Update	Delete	Deduplication	Comment
FULL	X	X	X	X	X	X	X	X		F				no	Full load - no delta
TRIGGER	X								/DVD/GL_DELFLAG	D, F, I, L	““	““	D	yes	Delta is captured by database triggers and stored in a shadow table (keys only). During a delta replication the shadow table will be deduplicated (latest entries) and joined to the source table to get the data itself. https://datavard.atlassian.net/l/c/HV9UoPXX https://datavard.atlassian.net/l/c/mnhf3fwV
VALUE	X	X	X		X					D, F, I, L				no	Delta is captured by field values like a creation or change date or a increasing key number - https://datavard.atlassian.net/l/c/xiyji801. So based on the field you choose you can capture creates and/or changes. Deletes can only be recognized if the table itself has a delete column. Hard deletes on database level can’t be captured by this method.
VALUE_DIST	X	X	X							D, F, I, L
DATE	X	X	X		X					D, F, I, L
TMSTMP	X	X	X		X					D, F, I, L
CHANGELOG						X			RODMUPDMOD	D, F, I, L	N	““	D	optional - (Glue-2202) DSO Fetcher	Recordmodes can be selected in the https://datavard.atlassian.net/l/c/8ZtgHTeL. Refer table below.
REQUEST							X		RODMUPDMOD	D, F, I, L	N	““	D	optional - (Glue-2202) DSO Fetcher
SLT									/DVD/GL_DELFLAG		I	U	D	optional	A = Archive
ODP_DELTA			X	X	X	X	X		ODQ_CHANGEMODE	D, F, I, L, R	C	U	D		ODQ_CHANGEMODE and ODQ_ENTITYCNTR are available. Refer table below. https://datavard.atlassian.net/l/c/X1UGkpNh

Doing the merge

Combining the information from above a merge could be done like this (pseudocode):

if deltaType = "Trigger":

  if loadType = "L" or "F":
    -> overwrite
    
  if loadType = "D":
    merge "MERGETABLE" with "DELTALOAD" on <Keyfields>
      when matched and "DELTALOAD"."INDICATIONFIELD" = "D" then DELETE
      when matched and "DELTALOAD"."INDICATIONFIELD" <> "D" then UPDATE
      when not matched and "DELTALOAD"."INDICATIONFIELD" <> "D" then INSERT

Merge to http://delta.io tables

SNP provides a python notebook that can be used as a template to show how the merge can be done on delta tables with a spark environment like Databricks or Azure Synapse. The reading of metadata and creation of merge statement is done dynamically. Will be used like this:

GlueRequest(... path to request ...'_m.json').writeToDelta(... path to delta table ... '')

Please ask our Support team or Consultants to provide the latest version of the demo spark notebook.