(SM 2011) S3/Redshift Storage Setup
Prerequisites
Open Ports
In a controlled network environment, it is common to have firewall rules in place. In order to enable communication of SAP systems with AWS, the following port numbers should be reachable from the SAP system:
Port | Type | AWS service |
---|---|---|
5439 | tcp | Redshift |
80/443 | http/https | S3 |
These are default port numbers of AWS services.
AWS User
We recommend creating distinct users for every SAP system connected to the AWS services in order to isolate each system's data.
The recommended user names are mirroring SAP's guideline for these user names: <sid>adm => <sid>aws.
S3 bucket
You must manually, using the AWS console, create an S3 bucket. Datavard Storage Management does not create it automatically.
- The customer has to provide the details needed to connect to AWS S3 Service including security pair ("access_key_id", "secret_key_id")
Redshift cluster and database
You must create a Redshift cluster together with the Redshift database.
We recommend creating a dedicated database in Redshift for each SAP system. The recommended database name is sap<sid> (sapdvq).
Redshift database user
You must grant the permissions to some system tables in Redshift DB for SAP SM data computation ( table size: "grant select on pg_catalog.SVV_TABLE_INFO to DVD_USER;")
Also, make sure, that Datavard user can run select on table pg_catalog.PG_TABLE_DEF ( table exists: "grant select on pg_catalog.PG_TABLE_DEF to DVD_USER;" )
OS prerequisites (On SAP host)
This group of requirements relates to the operating systems underlying the SAP system with all its application servers. Datavard products (e.g. Datavard Glue, OutBoard DataTiering) have been developed and tested on the SUSE Linux environment and Windows Server 2012. However, by design, they are not limited by the choice of an operating system, if the requirements listed in this guide are met.
OS directories
Datavard connector uses a directory dedicated to its configuration files:
$ ls -ld /sapmnt/DVQ/global/security/dvd_conn rwx------ 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/ |
The folder is used to store drivers and is shared among SAP application servers. Set the ownership and permissions appropriately to <sid>adm.
JDBC Drivers
JDBC protocol is used to connect to AWS Redshift. AWS Redshift JDBC driver (RedshiftJDBC41-no-awssdk-1.2.16.1027.jar) must be manually stored on the operating system and be accessible to the Datavard connector.
We recommend storing the drivers in a folder within the connector directory, organized in sub-folders to avoid possible conflicts.
$ ls -ld /sapmnt/DVQ/global/security/dvd_conn/* drwxr-xr-x 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/security/dvd_conn/aws $ ls -l /sapmnt/DVQ/global/security/dvd_conn/aws drwxr-xr-x 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/security/dvd_conn/aws/RedshiftJDBC41-no-awssdk-1.2.16.1027.jar |
SSL Certificates for Java
When using JDBC over SSL, certification authority which is part of standard Java installation is used (Starfield Technologies). Browser Test Link: https://good.sca0a.amazontrust.com/ .
This means that no additional certificates are needed.
Java connector
Java connector is a critical middle-ware component. Please follow the steps in the chapter Java Connector Setup to set it up before you continue.
Configuration
When all prerequisites are fulfilled, further configuration is performed from the SAP system.
Drivers logical file definition
As described in JDBC Drivers, JDBC drivers for AWS service connection are stored on operating systems underlying the SAP system. Define them also as logical names to the SAP system via the FILE transaction.
In our example, we are using S3 and Redshift JDBC Drivers provided by AWS. The definition of driver specific folders looks as follows:
ZDVD_AWS_REDSHIFT_DRIVERS refers to the folder in which AWS JDBC drivers provided by Amazon have been placed in the section JDBC Drivers.
Storage Management setup
A generic Datavard software component: “Reuse Library” is used for the setup. The required component is “Storage Management”.
Datavard Storage Management facilitates transparent communication with different types of storages. This includes S3 for flat files and Redshift for structured data.
S3 storage
In order to transparently store data, you should define two types of AWS storages in Storage Management:
- S3 storage which facilitates a transfer of files to S3
- Redshift storage which enables data replication between SAP tables and Redshift tables
Create S3 storage through the transaction:
/DVD/SM_SETUP
Entries explained:
- Storage ID – name of the storage
- Storage Type – choose AWS_S3 for S3
- Description – extended description of the storage for easier identification
- AWS Bucket name – name of the existing bucket in S3
- AWS Region – region where the bucket exists (recommendation is that also Redshift cluster exists in the same region)
- AWS Access Key – security information "access_key_id"
- AWS Secret Key – security information "secret_key_id"
- Java connector RFC – TCP/IP RFC destination for communication with Datavard Java connector
Path for TMP files – directory on SAP system where the temporary files will be stored
Path for TMP files must be visible for the instance of the java connector. In case your SAP system is a cluster consisting of multiple physical machines, you need to configure NFS (Network File System). Performing this step you'll make sure that all application servers will be writing temporary data into one shared location, which is visible for the Java connector instance. With this configuration, you will be able to perform storage operations on S3 storage regardless of the actual SAP application server. /sapmnt is usually a NFS directory shared among all SAP application servers.
Complete the creation of the storage by confirming (F8).
Redshift storage
The AWS Redshift storage is created in a similar way as the process of setting up the S3 storage with different settings:
Entries explained:
- Storage ID – Name of the storage
- Storage Type – Choose REDSHIFT storage type
- Description – Extended description of the storage for easier identification
- Referenced Storage – Defines which S3 storage will be used by Redshift
- Redshift host – Redshift server hosting the Redshift service
- Port – port number on which Redshift service is accessible
- Database name – Name of DB in Redshift cluster
- Database schema – Name of the schema (normally is public)
- Java connector RFC – AWS RFC destination (you may use the same one as for S3 storage)
- Driver engine – Use REDSHIFT
- Driver Classname – Classname of the driver used for loading (the recent version is com.amazon.redshift.jdbc41.Driver)
- Driver path – Logical name of the driver directory
- Username – Redshift user created in the Redshift user group
- Password hashed – Type in the password in the lower line and use the [Hash] button
- Login timeout (seconds) – Threshold for JDBC timeout
- Enable SSL – Checked if SSL authentication should be used
- Use extended escaping - Checked if extended escaping should be used (replaces escape characters, such as newline, backspace, tabulator, etc., by the 'space' character)
- SSL Mode – There are two options for SSL mode:
verify-ca (default option, verifies that the certificate comes from trusted CA)
verify-full (both CA and hostname listed in the certificate are verified)
Finish the creation of the storage by confirming (F8). If the SAP system is able to authenticate against AWS Redshift and receives the expected result of the SQL command 'use database', the creation of the storage is considered successful.