(Glue-2002) Q: We are running the replication from SAP to an RDBMS target (e.g. Microsoft SQL server). During the extraction, we receive duplicate records. How can we avoid this?

When using Hadoop as a target storage, duplicate keys will hardly ever occur because most Hadoop based SQL engines do not provide ACID or even do not define table keys. However, on classical RDBMS (such as MSS, Oracle, Sybase ASE, …) primary keys are used to differentiate table keys.

To use this concept on a data target for Glue based data replication, you should do the following:

  1. Create the table and import the relevant SAP fields to the target table
  2. Optionally, remove the field GLREQUEST
  3. Ensure that the table key is defined properly on the target table
  4. Activate the target table

If you still receive duplicate records when extracting data, you should ensure that the storage management connection you are using is update enabled. To do this, go to the storage management settings, change to "edit" mode and go to the detail screen of your external storage definition. The checkbox option "Enable Update" must be set, otherwise, Glue will only perform inserts which may lead to duplicate keys for records from SAP which are updated.

Note: you may want to remove the field GLREQUEST as a key field to ensure that the data on the external storage reflects the SAP data 1:1.