(Glue-2311) Merge data replicated in file format (handle delta)

Chapter explained which files are written to file-based storage:

  • Schema metadata (_S.json files): information about the schema like columns, data types, keys, etc.

  • Request metadata (_M.json files): information about a specific request (or load) like delta type, load type, or volume.

  • Data (CSV | parquet).

You can edit the naming convention with SNP Glue™ Profile Settings.

Within this chapter, we’ll explain how to use the provided metadata to recognize delta and do proper merges (for example in spark-based applications like Databricks).

Schema metadata - KeyFields

To build your merge condition, you must know the columns which are key fields. Data files like CSV or parquet do not hold this information. Therefore you have to read the schema metadata file and filter for the fields where the KeyFlag value is X.

More information can be found in the chapter Create SNP Glue™ table on File storage | JSON-metadata-structure.

Request metadata - LoadType

The property LoadType is a single character that identifies the extraction load type:

  • I: Delta Init Without Data.

  • L: Delta Init + Full Load.

  • D: Delta Load.

  • F: Full Load (Repair) Without Delta Update.

  • R: Recovery of Previous Deltas - only.

More information can be found in the chapter .

Request metadata - DeltaType

The property DeltaType tells you which delta mechanism was used for reading the data from the source object. In the table below you can see which DeltaTypes you can expect for the used SAP source object:

Objects

 

SAP Object

SAP Table

 

Table View / Hana View

CDS View

 

Extractor

 

BW Object

 

DSO

 

aDSO

 

BEx Query

SNP Glue™ Fetcher

SAP Table Fetcher

SAP Table Fetcher

Hana View Fetcher

SAP Table Fetcher

ODP Fetcher

ODP Fetcher

Listcube Fetcher

ODP Fetcher

ODP Fetcher

DSO Fetcher

ODP Fetcher

aDSO Fetcher

BEx Query Fetcher

DeltaType

Indication field (type)

Load type

Insert

Update

Delete

Deduplication

Comment

FULL

X

X

X

X

X

X

X

X

 

F

 

 

 

No

Full load - no delta

TRIGGER

X

 

 

 

 

 

 

 

/DVD/GL_DELFLAG

D, F, I, L

““

““

D

Yes

Delta is captured by database triggers and stored in a shadow table (keys only). During a delta replication, the shadow table will be deduplicated (latest entries) and joined to the source table to get the data itself.

VALUE

X

X

X

 

X

 

 

 

 

D, F, I, L

 

 

 

No

 

Delta is captured by field values like a creation or change date or an increasing key number - . So based on the field you choose you can capture creates and/or changes. Deletes can only be recognized if the table itself has a delete column. Hard deletes on the database level can’t be captured by this method.

VALUE_DIST

X

X

X

 

 

 

 

 

D, F, I, L

DATE

X

X

X

 

X

 

 

 

D, F, I, L

TMSTMP

X

X

X

 

X

 

 

 

D, F, I, L

CHANGELOG

 

 

 

 

 

X

 

 

RODMUPDMOD

D, F, I, L

N

““

D

Optional - DSO deduplication

Recordmodes can be selected in the . Refer to the table below.

REQUEST

 

 

 

 

 

 

X

 

RODMUPDMOD

D, F, I, L

N

““

D

SLT

 

 

 

 

 

 

 

 

/DVD/GL_DELFLAG

 

I

U

D

Optional

A = Archive

ODP_DELTA

 

 

X

X

X

X

X

 

ODQ_CHANGEMODE

D, F, I, L, R

C

U

D

 

ODQ_CHANGEMODE and ODQ_ENTITYCNTR are available. Refer to the table below.

ODQ Changemode and BW Recordmode Overview from SAP

Doing the merge

Combining the information from above a merge could be done like this (pseudocode):

if deltaType = "Trigger": if loadType = "L" or "F": -> overwrite if loadType = "D": merge "MERGETABLE" with "DELTALOAD" on <Keyfields> when matched and "DELTALOAD"."INDICATIONFIELD" = "D" then DELETE when matched and "DELTALOAD"."INDICATIONFIELD" <> "D" then UPDATE when not matched and "DELTALOAD"."INDICATIONFIELD" <> "D" then INSERT

 

Merge to http://delta.io tables

SNP provides a Python notebook that can be used as a template to show how the merge can be done on delta tables with a spark environment like Databricks or Azure Synapse. The reading of metadata and the creation of merge statements is done dynamically. It will be used like this:

GlueRequest(... path to request ...'_m.json').writeToDelta(... path to delta table ... '')

Ask our Support team or Consultants to provide the latest version of the demo spark notebook.