App Architecture
SAP → Snowflake communication
https://datavard.atlassian.net/wiki/spaces/DATAVARD/pages/5110731989
Data Organization
In the application, data is structured into sources, each linked to a single storage connection from SNP Glue. Each source represents a schema and maintains its own instance of settings, allowing data to be managed separately across different environments, such as development, quality, and production.
Sub-Schemas for Logical Separation
A source can include logical sub-schemas, which help organize generated merge objects into different schemas. This ensures that tables like staging and merge tables can have the same name but exist separately. These sub-schemas are:
Automatically created when tables are set up
Automatically removed during housekeeping if they remain empty
Controller Tasks
Each source includes controller tasks, which are serverless tasks that manage internal operations, e.g. triggering merge execution based on the size of the delta.
Merge objects
Each staging table created comes with its four merge objects:
Merge table: Table with de-duplicated data
A backup is created during the deletion of its staging table. The retention period can be set in the Source settingshttps://datavard.atlassian.net/wiki/spaces/DATAVARD/pages/5726085003/App+Settings#Source-settings.
Stream: Captures delta on the staging table
View: Provides real-time de-duplication merging delta from the stream and the merge table
Task: Merges delta from the stream into the merge table. It should be run as infrequently as possible.
Housekeeping
It is responsible for the following operations:
Deleting backup tables
Deleting old metadata
Deleting empty sub-schemashttps://datavard.atlassian.net/wiki/spaces/GLUE/pages/4288118864/App+Architecture#Sub-Schemas-for-Logical-Separation.
Deleting already merged data from staging tables. This can be turned on/off in the Source settingshttps://datavard.atlassian.net/wiki/spaces/DATAVARD/pages/5726085003/App+Settings#Source-settings.
Deletion of old statistics. The retention period can be set in the Source settingshttps://datavard.atlassian.net/wiki/spaces/DATAVARD/pages/5726085003/App+Settings#Source-settings.
Prevention of staleness in empty streams. This can be turned on/off in the Source settingshttps://datavard.atlassian.net/wiki/spaces/DATAVARD/pages/5726085003/App+Settings#Source-settings.