(SM-2002) S3/Redshift Storage Setup
Prerequisites
Open Ports
In a controlled network environment it is common to have firewall rules in place. In order to enable communication of SAP systems with AWS, the following port numbers should be reachable from the SAP system:
Port | Type | AWS service |
---|---|---|
5439 | tcp | Redshift |
80/443 | http/https | S3 |
These are default port numbers of AWS services.
AWS User
We recommend creating distinct users for every SAP system connected to the AWS services in order to isolate each system's data.
The recommended user names are mirroring SAP's guideline for these user names: <sid>adm => <sid>hdp.
S3 bucket
You must manually, using AWS console, create S3 bucket. Datavard Storage Management does not create it automatically.
- Customer has to provide the details needed to connect into AWS S3 Service including security pair ("access_key_id", "secret_key_id")
Redshift cluster and database
You must create Redshift cluster together with Redshift database.
We recommend creating a dedicated database in Redshift for each SAP system. The recommended database name is sap<sid> (sapdvq).
Redshift database user
You must grant the permissions to some system tables in Redshift DB for SAP SM data computation ( table size: "grant select on pg_catalog.SVV_TABLE_INFO to dvd_load;")
Also, make sure, that Datavard user can run select on table pg_catalog.PG_TABLE_DEF ( table exists: "grant select on pg_catalog.PG_TABLE_DEF to dvd_load;" )
OS prerequisites (On SAP host)
This group of requirements relates to the operating systems underlying SAP system with all its application servers. Datavard products (e.g. Datavard Glue, OutBoard DataTiering) have been developed and tested on the SUSE Linux environment and Windows Server 2012. However, by design they are not limited by the choice of an operating system, if the requirements listed in this guide are met.
OS directories
Datavard connector uses a directory dedicated to its configuration files:
$ ls -ld /sapmnt/DVQ/global/security/dvd_conn drwx------ 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/ |
The folder is used to store drivers and is shared among SAP application servers. Set the ownership and permissions appropriately to <sid>adm.
JDBC Drivers
JDBC protocol is used to connect to AWS Redshift. AWS Redshift JDBC driver (RedshiftJDBC41-no-awssdk-1.2.16.1027.jar) must be manually stored on the operating system and be accessible to Datavard connector.
We recommend storing the drivers in a folder within the connector directory, organized in sub-folders to avoid possible conflicts.
$ ls -ld /sapmnt/DVQ/global/security/dvd_conn/* drwxr-xr-x 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/security/dvd_conn/aws $ ls -l /sapmnt/DVQ/global/security/dvd_conn/aws drwxr-xr-x 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/security/dvd_conn/aws/RedshiftJDBC41-no-awssdk-1.2.16.1027.jar |
SSL Certificates for Java
When using JDBC over SSL, certification authority which is part of standard Java installation is used (Starfield Technologies). Browser Test Link: https://good.sca0a.amazontrust.com/.
This means that no additional certificates are needed.
Java connector
Java connector is a critical middle-ware component. Please follow the steps in this guide Java Connector Setup to set it up before you continue.
Configuration
When all prerequisites are fulfilled, further configuration is performed from the SAP system.
Drivers logical file definition
As described in JDBC Driver194153621, JDBC drivers for AWS service connection are stored on operating systems underlying SAP system. Define them also as logical names to the SAP system via the FILE transaction.
In our example, we are using S3 and Redshift JDBC Drivers provided by AWS. Definition of driver specific folders looks as follows:
ZDVD_AWS_REDSHIFT_DRIVERS refers to the folder in which AWS JDBC drivers provided by Amazon have been placed in the section JDBC Drivers.
Password hash generator
Use the report /DVD/XOR_GEN for this purpose.
Storage Management setup
A generic Datavard software component: “Reuse Library” is used for the setup. The required component is “Storage Management”.
Datavard Storage Management facilitates transparent communication with different types of storages. This includes various types of databases, Hadoop, and AWS: S3 for flat files and Redshift for structured data.
S3 storage
In order to transparently store data, you should define two types of AWS storages in Storage Management:
- S3 storage which facilitates a transfer of files to S3
- Redshift storage which enables data replication between SAP tables and Redshift tables
Create S3 storage through the transaction:
/DVD/SM_SETUP > [Edit mode] > [New storage]
Entries explained:
- Storage ID – name of the storage
- Storage Type – choose AWS_S3 for S3
- Description – extended description of the storage for easier identification
- AWS Bucket name – name of the existing bucket in S3
- AWS Region – region where the bucket exists (recommended is that also Redshift cluster exists in the same region)
- AWS Access Key – security information "access_key_id"
- AWS Secret Key – security information "secret_key_id"
- RFC Destination – RFC destination defined in TCP/IP RFC
Path for TMP files – directory on SAP system where the temporary files will be stored
Path for TMP files must be visible for the instance of the java connector. In case your SAP system is a cluster consisting of multiple physical machines, you need to configure NFS (Network File System). Performing this step you'll make sure that all application servers will be writing temporary data into one shared location, which is visible for java connector instance. With this configuration you will be able to perform storage operations on S3 storage regardless the actual SAP application server.
Complete the creation of the storage by confirming (F8).
Redshift storage
The AWS Redshift storage is created in a similar way as the process of setting up the S3 storage with different settings:
Entries explained:
- Storage ID – Name of the storage
- Storage Type – Choose REDSHIFT storage type
- Description – Extended description of the storage for easier identification
- Database Name – Name of DB in Redshift cluster
- Schema Name – Name of the schema (normally is public)
- Redshift host – Redshift server hosting the Redshift service
- Port – Redshift port hosting the Redshift service
- Username – Redshift user created in Redshift user group
- Password for JDBC connection
- Java connector RFC – AWS RFC destination (you may use the same one as for S3 storage)
- Driver engine – Use REDSHIFT
- Driver Classname – Classname of the driver used for loading (recent version is com.amazon.redshift.jdbc41.Driver)
- Driver path – Logical name of the driver file
- JDBC Login TimeOut in Seconds – Threshold for JDBC timeout
- Password for JDBC Connection is Hashed – If checked, enter a hashed password. Use instructions in S3/Redshift Storage Setup Password hash generator to generate a hash.
- Referenced Storage – Defines which S3 storage will be used by Redshift
- SSL Mode
- Enable SSL – Checked if SSL authentication should be used
Finish the creation of the storage by confirming (F8). If the SAP system is able to authenticate against AWS Redshift and receives the expected result of the SQL command 'use database', the creation of the storage is considered successful.