(SM-1911) S3/Redshift Storage Setup

Prerequisites

Open Ports

In a controlled network environment, it is common to have firewall rules in place. In order to enable communication of SAP systems with AWS, the following port numbers should be reachable from the SAP system:

PortTypeAWS service
5439tcpRedshift
80/443http/https

S3

These are default port numbers of AWS services.

AWS User

We recommend creating distinct users for every SAP system connected to the AWS services in order to isolate each system's data. 

The recommended user names are mirroring SAP's guideline for user names: <sid>adm => <sid>hdp.

S3 bucket

You must manually, using AWS console, create S3 bucket. Datavard Storage Management does not create it automatically.

  • Customer has to provide details needed for connection into AWS S3 Service including security pair ("access_key_id""secret_key_id")

Redshift cluster and database

You must create Redshift cluster together with Redshift database. 

We recommend creating a dedicated database in Redshift for each SAP system. The recommended database name is sap<sid> (sapdvq).

Redshift database user

You must grant permissions to some system tables in Redshift DB for SAP SM data computation ( table size: "grant select on pg_catalog.SVV_TABLE_INFO to dvd_load;")

Also, make sure, that Datavard user can run select on table pg_catalog.PG_TABLE_DEF ( table exists: "grant select on pg_catalog.PG_TABLE_DEF to dvd_load;" )

OS prerequisites (On SAP host)

This group of requirements relates to the operating systems underlying SAP system with all its application servers. Datavard products (e.g. Datavard Glue, OutBoard DataTiering) have been developed and tested on the SUSE Linux environment and Windows Server 2012, but by design are not limited by the choice of an operating system, if requirements listed in this guide are met.

Java version 

An up-to-date Java version (JRE 1.7 or higher) should be available to the SAP's <sid>adm user.

SAP Java Connector library

SAP Java Connector 3.0 library libsapjco3.sowhich can be downloaded from the SAP marketplace, needs to be located on SAP Application Server. It should be located in the directory referenced by LD_LIBRARY_PATH environment variable of <sid>adm user.

It is recommended to append a shared Datavard folder (/sapmnt/DVQ/global/security/dvd_conn) to LD_LIBRARY_PATH variable of <sid>adm user and place libsapjco3.so to this folder.

$ echo $LD_LIBRARY_PATH
/usr/sap/HD4/SYS/exe/run:/usr/sap/HD4/SYS/exe/uc/linuxx86_64:/usr/sap/HD4/hdbclient:/home/hd4adm/IQ16_ODBC_CLIENT_Linux:/sapmnt/DVQ/global/security/dvd_conn

/sapmnt/DVQ/global/security/dvd_conn/ # ls -l libsapjco3.so
-rwxr-x--- 1 dvqadm sapsys 59 Apr  5 15:12 libsapjco3.so

OS directories

Datavard connector uses two directories dedicated to its configuration and log files:

$ ls -ld /usr/sap/DVQ/dvd_conn /sapmnt/DVQ/global/security/dvd_conn
drwx------ 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/security/dvd_conn
drwxr-xr-x 2 dvqadm sapsys 4096 --- /usr/sap/DVQ/dvd_conn

The first one (/sapmnt/<SID>/global/security/dvd_conn) is used to store drivers and is shared among SAP application servers.

The second one stores configuration and log files of Java connector used by Datavard. Set the ownership and permissions appropriately to <sid>adm.

Create directories on each SAP application server to store Storage Management related configuration and log files. <sid>adm user needs permissions to access those folders.

JDBC Drivers

JDBC protocol is used to connect to AWS  Redshift. AWS Redshift JDBC driver (RedshiftJDBC41-no-awssdk-1.2.16.1027.jar) must be manually stored on the operating system and be accessible to Datavard connector.

We recommend storing the drivers in a folder within the connector directory, organized in sub-folders to avoid possible conflicts.

$ ls -ld /sapmnt/DVQ/global/security/dvd_conn/*
drwxr-xr-x 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/security/dvd_conn/aws


$ ls -l /sapmnt/DVQ/global/security/dvd_conn/aws
drwxr-xr-x 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/security/dvd_conn/aws/RedshiftJDBC41-no-awssdk-1.2.16.1027.jar

SSL Certificates for Java

When using JDBC over SSL, certification authority which is part of standard Java installation is used (Starfield Technologies).  Browser Test Link: https://good.sca0a.amazontrust.com/.

This means that no additional certificates are needed.

Configuration

When all prerequisites are fulfilled, further configuration is performed from the SAP system.

RFC Destination 

Java RFC

Java RFC by name refers to the Java service which is used for communication with AWS services. Define the parameters of the Java connector in separate tables:



Entries explained:

  • Connection Type – T for TCP/IP Connection
  • Activation Type – select Registered Server Program
  • Program ID – DVD_JAVA_CONN

It is important to enable the DVD_JAVA_CONN program registration in the SAP gateway (SAP Gateway Access Control Lists).

Java connector setup

Java connectors are configured using files that you define in SAP system.
They are configured in the following steps:

Logical file path definition

The first step is to map logical path ZDVD_DIR to the OS path where the files are stored. The actual OS path is created in the section Datavard connector directories (/sapmnt/<SID>/global/security/dvd_conn).

Drivers logical file definition

As described in 194153621194153621, JDBC drivers for AWS service connection are stored on operating systems underlying SAP system. Define them also as logical names to the SAP system via the FILE transaction.

In our example, we are using S3 and Redshift JDBC Drivers provided by AWS. Definition of driver specific folders looks as follows:

ZDVD_AWS_REDSHIFT_DRIVERS refers to the folder in which AWS JDBC drivers provided by Amazon have been placed in the section JDBC Drivers.

Java connector configuration

There is a java connector available, which Storage Management uses for communication with S3 and to run SQL queries against Redshift. 


You should manually configure the following tables:


/DVD/JAVA_HDR
– maps an RFC destination to the Datavard Java connector version.

If required, the table entries can be maintained through the t-code /dvd/rl_tm.

Table example:

/DVD/JAVA_CONFIG – stores parameters for Datavard Java connectors. Fill in the table with entries through the t-code /dvd/rl_tm.

Sample configuration:


The Java service is started with a system command. You can adjust the name of this command in the table /DVD/JAVA_CONFIG with the parameter JAVA_START_CMD. The default name of the command is ZDVD_START_JAVA. In the case the system command doesn't exist, it is created automatically. You can view the system commands through the transaction SM69.

On Linux, another system command is required, which sets executable rights for the configuration files (chmod 755 <filename>). Its name can be adjusted with the parameter CHMOD_CMD with the default value ZDVD_CHMOD.


Prerequisites:

Multiple entries from /DVD/JAVA_CONFIG table define the file path and the filename of generated files. These are marked as (generated) in detailed explanation below.

Please note, that as a prerequisite, you must create before all the paths specified in the table (only folders).

Entries explained:

  • CHMOD_CMD - name of the system command for setting executable rights
  • CLIENT – ID of the SAP system source data client
  • CONFIG_AS_PATH – path and configuration file specifying authentication information towards the SAP system to Java connector (generated)
  • CONFIG_PATH – path and configuration file specifying program ID and SAP gateway information to Java connector (generated)
  • CONN_COUNT – number of connections registered at the gateway
  • JAR_PATH – path and filename of Datavard Java connector JAR-file. This jar is generated (content is part of transport) in the specified folder and filename. (to be generated)
  • JAR_ARG –  java arguments when starting Java connector component 
  • JAVA_EXE – path to Java binary used by <sid>adm user
  • JAVA_LIBRARY_ID – identificator of Java connector libraries  (VERSION_ID in table /DVD/JAVA_LIB). This content is automatically generated in a place of JAR_PATH in "lib" folder. (to be generated)
  • JAVA_START_CMD – start OS command for starting Java process in the background. The Java binary path is taken from JAVA_EXE parameter.
  • JAVA_VENDOR – Java Machine vendor
  • LOG4J_CONFIG_PATH path to log4j2.xml. This xml file is generated from SAP, which contains configuration of logging and it's maintenance. (to be generated)
  • LOG4J_LOG_DELETION  parameter to to log4j2.xml for log files retention. How many days to keep logs before they gets automatically deleted. (to be generated)
  • LOG4J_LOG_LEVEL – parameter to to log4j2.xml for default "root" log4j level. (to be generated)
  • LOG_FILES_PATH – parameter to to log4j2.xml for path to log4j log files. (to be generated)
  • MAX_RAM_USED – the amount of memory dedicated to the Java connector
  • USERNAME – RFC user defined in SAP RFC role and user
  • PASSWORD – password of an RFC user (defined above) in a hashed form. Use instructions from the part 194153621 to generate a hash.
  • PEAK_LIMIT – maximum number of connections that can be created for a destination simultaneously
  • PROG_IDJava RFC of this Java connector 
  • REP_DEST – Java connector client destination from which to obtain the repository
  • WORK_THREAD_MAX – number of threads that can be used by Java Connector
  • WORK_THREAD_MIN –  number of threads always kept running by Java Connector
Password hash generator

Use the report /DVD/XOR_GEN for this purpose.


Storage Management setup

A generic Datavard software component: “Reuse Library” is used for the setup. The required component is “Storage Management”.

Datavard Storage Management facilitates transparent communication with different types of storages, which includes various types of databases, Hadoop, and AWS: S3 for flat files and Redshift for structured data.

S3 storage

In order to transparently store data, you should define two types of AWS storages in Storage Management:

  • S3 storage which facilitates a transfer of files to S3
  • Redshift storage which enables data replication between SAP tables and Redshift tables

Create S3 storage through the transaction:

/DVD/SM_SETUP > [Edit mode] > [New storage]



Entries explained:

  • Storage ID – name of the storage
  • Storage Type – choose AWS_S3 for S3
  • Description – extended description of the storage for easier identification
  • AWS Bucket name - name of the existing bucket in S3
  • AWS Region - region where the bucket exists (recommended is that also Redshift cluster exists in same region)
  • AWS Access Key - security information "access_key_id"
  • AWS Secret Keysecurity information "secret_key_id"
  • RFC Destination – RFC destination defined in TCP/IP RFC
  • Path for TMP files - directory on SAP system where the temporary files will be stored

    Path for TMP files must be visible for the instance of the java connector. In case your SAP system is a cluster consisting of multiple physical machines, you need to configure NFS (Network File System). Performing this step you'll make sure that all application servers will be writing temporary data into one shared location, which is visible for java connector instance. With this configuration you will be able to perform storage operations on S3 storage regardless the actual SAP application server.

Complete the creation of the storage by confirming (F8).

Redshift storage

The AWS Redshift storage is created in a similar way as the process of setting up the S3 storage with different settings:


Entries explained:

  • Storage ID – Name of the storage
  • Storage Type – Choose REDSHIFT storage type
  • Description – Extended description of the storage for easier identification
  • Database Name - Name of DB in Redshift cluster
  • Schema Name - Name of schema (normally is public)
  • Redshift host – Redshift server hosting the Redshift service
  • Port – Redshift port hosting the Redshift service
  • Username - Redshift user created in Redshift user group
  • Password for JDBC connection
  • Java connector RFC – AWS RFC destination (you may the same one as for S3 storage)
  • Driver engine - use REDSHIFT
  • Driver Classname - Classname of the driver used for loading (recent version is com.amazon.redshift.jdbc41.Driver)
  • Driver path - Logical name of the driver file
  • JDBC Login TimeOut in Seconds - threshold for JDBC timeout
  • Password for JDBC Connection is Hashed - if checked, enter a hashed password. Use instructions in 1214316604 to generate a hash.
  • Referenced Storage - defines which S3 storage will be used by Redshift
  • SSL Mode
  • Enable SSLChecked if SSL authentication should be used

Finish the creation of the storage by confirming (F8). If the SAP system is able to authenticate against AWS Redshift and receives the expected result of the SQL command 'use database', the creation of the storage is considered successful.

Troubleshooting

In the case of any issues with Java connector, you can find the Java logs in the application server to determine a source of the issue with the report /DVD/SM_HIVE_DISPLAY_JAVA_LOG.


Entries explained:

  • RFC Destination - Destination of RFC with Java service
  • Num. of last lines to display - How many last lines of the log should be displayed
  • Name of file with log - Filename of a file with java log. (in default destination)
  • Is log archived? - Check, if you need to display the archived log (date obligatory)
  • Date of log - Date of the archived log, which should be displayed