(SM-2411) Snowflake Storage Setup

This page provides general guidelines on how to set up Snowflake Storage to work with Storage Management.

 

With the introduction of Snowpipe Streaming, the connection setup procedure is slightly different compared to the previous Batch Load data replication method.
The crucial differentiator is whether Snowflake Ingest SDK is to be used or not. Because this SDK contains an internal driver manager conflicting with the default one used by SNP Java Connector, the Snowpipe Streaming needs to be fully separated from any other connections.
This is further explained in the instructions below.

1. General prerequisites

1.1 Network Connectivity

To enable communication of SAP systems with the Snowflake environment, a set of hosts/ports related to your Snowflake account need to be reachable from the SAP system.

Executing SELECT SYSTEM$ALLOWLIST(); or SELECT SYSTEM$ALLOWLIST_PRIVATELINK(); generates .csv file with a list of hosts and ports involved in the communication.
All communication is outbound from the SAP perspective.

Reference to Snowflake documentation: SnowCD (Connectivity Diagnostic Tool) | Snowflake Documentation.

1.2 Warehouse and Database

Snowflake warehouse and database must already exist.

1.3 Snowflake user

Snowflake technical users must be created with full privileges on the chosen Snowflake DB schema.
The recommended privilege for the Snowflake technical user is OWNERSHIP of the Snowflake database.
In case ownership of the database cannot be assigned to the technical user, minimal requirements are the following:

GRANT ALL ON DATABASE <DATABASE> TO ROLE <ROLE> GRANT USAGE ON WAREHOUSE <WAREHOUSE> TO ROLE <ROLE> GRANT ALL ON SCHEMA <SCHEMA> TO ROLE <ROLE>

There should be a one-to-one relation between Snowflake schemas and every SAP system connected, ensuring replicated data isolation.

2. OS prerequisites (On SAP host)

This group of requirements relates to the operating systems underlying the SAP system with all its application servers.
SNP products (e.g. SNP Glue™, SNP OutBoard™ Data Tiering) have been developed and tested in the SUSE Linux environment and Windows Server 2012.
However, by design, they are not limited by the choice of an operating system, if the requirements listed in this guide are met.

2.1 OS directory with JDBC Driver and/or Snowflake Ingest SDK

JDBC protocol is used to connect to Snowflake. Snowflake JDBC driver needs to be manually copied to the operating system and accessible to the Java connector.
Download Snowflake JDBC driver. The latest version of the Snowflake JDBC driver is recommended, except for the Snowpipe Streaming storage connection - the streaming replication method currently requires JDBC driver version 3.16.1 to 3.18.0 in combination with snowflake-ingest-sdk-2.2.0.jar.
It is recommended to store the drivers in a filesystem shared between the application servers, organized in sub-directories to avoid possible conflicts (some customers connect to multiple different platforms/services). Important information is that while the path to the driver directory is defined in the storage connection (SAP GUI), the drivers themselves are loaded by the Java connector component.

Sample directory with the JDBC driver for standard Batch Load method:

$ ls -l /sapmnt/<SID>/drivers/snowflake -rwxr-xr-x 1 <sid>adm root 69376363 Jan 15 09:45 snowflake-jdbc-3.18.0.jar

Sample directory with both JDBC driver and Ingest SDK for Snowpipe Streaming:

$ ls -l /sapmnt/<SID>/drivers/snowflake_streaming -rw-r--r-- 1 <sid>adm sapsys 52768695 Dec 1 16:31 snowflake-ingest-sdk-2.2.0.jar -rw-r--r-- 1 <sid>adm sapsys 33471655 Jan 31 11:53 snowflake-jdbc-3.18.0.jar

Set its ownership and permissions appropriately to <sid>adm[:sapsys].

For compatibility with the SNP Java connector, the Snowflake JDBC driver filename must begin with the prefix snowflake-jdbc. If there are multiple versions of the Snowflake driver present, the latest version will be utilized. It is advisable to keep only one version of the JDBC driver within the Snowflake driver directory.

The Snowflake Ingest SDK JAR filename must begin with the prefix snowflake-ingest. Similarly, if there are multiple versions of the Snowflake Ingest SDK available, the most recently added version will be used. It is recommended to keep only one version of both the JDBC driver and the Snowflake Ingest SDK within the Snowflake streaming driver directory.

There was an issue with the SSL handshake in older versions of the driver (3.13.x), this issue was fixed in the snow-jdbc-3.13.21 version.

3. SAP configuration

3.1 JAVA connector

Java connector is a critical middle-ware component. Follow the steps in Java Connector Setup to set it up/upgrade to a new version before you continue. 

The Java runtime environment is narrowed due to Snowflake JDBC driver requirement - it must be installed in a 64-bit environment and requires Java 1.8 (or higher), link to the official page: JDBC Driver | Snowflake Documentation.

3.2 Storage Management Setup

The final step in SAP & Snowflake connectivity is the creation of appropriate storage connections in transaction /DVD/SM_SETUP

For the Batch Load replication method SNOW_STAGE and SNOWFLAKE type connections need to be configured (the first one for data file upload, the second for SQL communication with Snowflake API):

3.2.1 Snowflake Stage (SNOW_STAGE)

Binary storage points to the Internal Stage of the Snowflake user. It is used for data transfer leveraging the extended functionality of the Snowflake JDBC driver.

Storage ID: Logical name of the storage connection.
Storage type: SNOW_STAGE (Snowflake internal stage).
Java connector RFC: TCP/IP RFC destination used for communication with Java connector.
Account name: Name of the Snowflake account without .snowflakecomputing.com domain, in the format: <account name>.<region>.<platform> (more information can be found in Snowflake account name doc).
User role: The user's role used for data insertion.
JAVA Call Repeat: Number of times failed calls should be retried.
Repeat delay (seconds: Delay between retried calls.
Custom location: When enabled, domain .snowflakecomputing.cn will be used. When disabled, the default domain .snowflakecomputing.com is used.
Driver path: Absolute path to the directory containing the Snowflake JDBC driver.
NOTE: In previous versions of Storage Management, SAP logical file of type DIR (directory) was used. This is now automatically converted to a physical path.
Connection pool size: Number of connections that can be kept open in the pool, reducing resource-expensive establishment of JDBC connections.
Username: Snowflake user.
Password: User password.

If Key Pair authentication is established according to Key-pair authentication and key-pair rotation | Snowflake Documentation, it can be configured with or without a passphrase as follows:

3.2.2 Snowflake Storage (SNOWFLAKE)

Transparent storage pointing to Snowflake Database. For sending SQL command data use the Snowflake JDBC driver.

Storage ID: Logical name of the storage connection.
Storage type: SNOWFLAKE (Snowflake transparent storage).
Referenced storage: Storage ID of Snowflake stage area for temporary file upload/download.
Java connector RFC: RFC Destination used for communication with JCo.
Account name: <account name>.<region>.<platform> (more information can be found in Snowflake account name doc).
Warehouse: Optional parameter, you can select the existing warehouse in the Snowflake account to be used.
Database: Name of an existing database in Snowflake.
Database schema: Name of an existing database schema in Snowflake.
User role: The user's role used for data insertion.
Custom location: When enabled, domain .snowflakecomputing.cn will be used. When disabled, the default domain .snowflakecomputing.com is used.
Driver path: Snowflake driver directory.
Hints: Additional string added to connection string when JDBC driver establishes a connection.
Connection pool size: Number of connections that can be kept open in the pool, reducing resource-expensive establishment of JDBC connections.
Table name prefix: Optional text value for naming prefix of all Glue tables created within this storage.
Use Transient tables: When enabled, only transient tables will be created on the storage.
Use Snowflake App for data merge: When enabled, the SNP Native app will be used.
Snowflake App name: Name of the existing Snowflake native App to be used.
Wrap values in staged CSV files: Depending on table content, it may be necessary to encapsulate field values to avoid errors during INSERT into the Snowflake table from staging CSV files.
Data delivery guarantee: Data transfer behavior. More information can be found in the chapter https://datavard.atlassian.net/wiki/spaces/ReuseLib/pages/4119724050.

 

For the Snowpipe Streaming replication method only one SNOWSTREAM type connection is sufficient:

3.2.3 Snowpipe Streaming Storage (SNOWSTREAM)

Transparent storage pointing to Snowflake Database. For sending SQL commands, data uses the Snowflake JDBC driver. To insert data into the Snowflake database Snowflake Ingest SDK is used. For more information about Snowpipe Streaming, requirements, limitations, and configuration visit the official Snowpipe Streaming documentation.

Storage ID: Logical name of the storage connection.
Storage type: SNOWSTREAM (Snowflake transparent storage - Streaming).
Java connector RFC: RFC Destination used for communication with JCo.
Account name: <account name>.<region>.<platform> (more information can be found in Snowflake account name doc).
Warehouse: Optional parameter, you can select the existing warehouse in the Snowflake account to be used.
Database: Name of an existing database in Snowflake.
Database schema: Name of an existing database schema in Snowflake.
User role: The user's role used for data insertion.
Custom location: When enabled, domain .snowflakecomputing.cn will be used. When disabled, the default domain .snowflakecomputing.com is used.
Driver path: Snowflake driver directory. It’s required to include the Snowflake Ingest SDK jar file in this directory. The SDK is available for download from the Snowflake Maven Central Repository.
Hints: Additional string added to connection string when JDBC driver establishes a connection.
Connection pool size: Number of connections that can be kept open in the pool, reducing resource-expensive establishment of JDBC connections.
Conversion Type: The type of data conversion to be used when sending data to SNP JCo. For most scenarios, we recommend using CSV conversion due to its stability and reliability.

CSV Conversion (default):

Performance: Slower than JCO table conversion.
Stability: Highly stable with no data inconsistencies expected.

JCO Table Conversion:

Performance: Better performance than CSV conversion.
Stability: This solution is unstable and can lead to inconsistencies between SAP and SNP JCo.

Username: Snowflake user.
Private Key file location: Path to Private Key file.
Private Key Encrypted: When enabled, a Password input field will be displayed.
Password: Password to Private key.
Use Snowflake App for data merge: When enabled, the SNP Native app will be used.
Snowflake App name: Name of the existing Snowflake native App to be used.
Table name prefix: Optional text value for naming prefix of all tables created within this storage.
Data delivery guarantee: Data transfer behavior. More information can be found in the chapter https://datavard.atlassian.net/wiki/spaces/ReuseLib/pages/4119724050.