(SM-2002) S3/Redshift Storage Setup

Prerequisites

Open Ports

In a controlled network environment it is common to have firewall rules in place. In order to enable communication of SAP systems with AWS, the following port numbers should be reachable from the SAP system:

PortTypeAWS service
5439tcpRedshift
80/443http/https

S3

These are default port numbers of AWS services.

AWS User

We recommend creating distinct users for every SAP system connected to the AWS services in order to isolate each system's data. 

The recommended user names are mirroring SAP's guideline for these user names: <sid>adm => <sid>hdp.

S3 bucket

You must manually, using AWS console, create S3 bucket. Datavard Storage Management does not create it automatically.

  • Customer has to provide the details needed to connect into AWS S3 Service including security pair ("access_key_id""secret_key_id")

Redshift cluster and database

You must create Redshift cluster together with Redshift database. 

We recommend creating a dedicated database in Redshift for each SAP system. The recommended database name is sap<sid> (sapdvq).

Redshift database user

You must grant the permissions to some system tables in Redshift DB for SAP SM data computation ( table size: "grant select on pg_catalog.SVV_TABLE_INFO to dvd_load;")

Also, make sure, that Datavard user can run select on table pg_catalog.PG_TABLE_DEF ( table exists: "grant select on pg_catalog.PG_TABLE_DEF to dvd_load;" )

OS prerequisites (On SAP host)

This group of requirements relates to the operating systems underlying SAP system with all its application servers. Datavard products (e.g. Datavard Glue, OutBoard DataTiering) have been developed and tested on the SUSE Linux environment and Windows Server 2012. However, by design they are not limited by the choice of an operating system, if the requirements listed in this guide are met.

OS directories

Datavard connector uses a directory dedicated to its configuration files:

$ ls -ld /sapmnt/DVQ/global/security/dvd_conn
drwx------ 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/security/dvd_conn

The folder is used to store drivers and is shared among SAP application servers. Set the ownership and permissions appropriately to <sid>adm.

JDBC Drivers

JDBC protocol is used to connect to AWS Redshift. AWS Redshift JDBC driver (RedshiftJDBC41-no-awssdk-1.2.16.1027.jar) must be manually stored on the operating system and be accessible to Datavard connector.

We recommend storing the drivers in a folder within the connector directory, organized in sub-folders to avoid possible conflicts.

$ ls -ld /sapmnt/DVQ/global/security/dvd_conn/*
drwxr-xr-x 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/security/dvd_conn/aws


$ ls -l /sapmnt/DVQ/global/security/dvd_conn/aws
drwxr-xr-x 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/security/dvd_conn/aws/RedshiftJDBC41-no-awssdk-1.2.16.1027.jar

SSL Certificates for Java

When using JDBC over SSL, certification authority which is part of standard Java installation is used (Starfield Technologies).  Browser Test Link: https://good.sca0a.amazontrust.com/.

This means that no additional certificates are needed.

Java connector

Java connector is a critical middle-ware component. Please follow the steps in this guide Java Connector Setup to set it up before you continue. 

Configuration

When all prerequisites are fulfilled, further configuration is performed from the SAP system.

Drivers logical file definition

As described in JDBC Driver194153621, JDBC drivers for AWS service connection are stored on operating systems underlying SAP system. Define them also as logical names to the SAP system via the FILE transaction.

In our example, we are using S3 and Redshift JDBC Drivers provided by AWS. Definition of driver specific folders looks as follows:

ZDVD_AWS_REDSHIFT_DRIVERS refers to the folder in which AWS JDBC drivers provided by Amazon have been placed in the section JDBC Drivers.



Password hash generator

Use the report /DVD/XOR_GEN for this purpose.


Storage Management setup

A generic Datavard software component: “Reuse Library” is used for the setup. The required component is “Storage Management”.

Datavard Storage Management facilitates transparent communication with different types of storages. This includes various types of databases, Hadoop, and AWS: S3 for flat files and Redshift for structured data.

S3 storage

In order to transparently store data, you should define two types of AWS storages in Storage Management:

  • S3 storage which facilitates a transfer of files to S3
  • Redshift storage which enables data replication between SAP tables and Redshift tables

Create S3 storage through the transaction:

/DVD/SM_SETUP > [Edit mode] > [New storage]



Entries explained:

  • Storage ID – name of the storage
  • Storage Type – choose AWS_S3 for S3
  • Description – extended description of the storage for easier identification
  • AWS Bucket name  name of the existing bucket in S3
  • AWS Region  region where the bucket exists (recommended is that also Redshift cluster exists in the same region)
  • AWS Access Key  security information "access_key_id"
  • AWS Secret Key  security information "secret_key_id"
  • RFC Destination – RFC destination defined in TCP/IP RFC
  • Path for TMP files  directory on SAP system where the temporary files will be stored

    Path for TMP files must be visible for the instance of the java connector. In case your SAP system is a cluster consisting of multiple physical machines, you need to configure NFS (Network File System). Performing this step you'll make sure that all application servers will be writing temporary data into one shared location, which is visible for java connector instance. With this configuration you will be able to perform storage operations on S3 storage regardless the actual SAP application server.

Complete the creation of the storage by confirming (F8).

Redshift storage

The AWS Redshift storage is created in a similar way as the process of setting up the S3 storage with different settings:


Entries explained:

  • Storage ID – Name of the storage
  • Storage Type – Choose REDSHIFT storage type
  • Description – Extended description of the storage for easier identification
  • Database Name  Name of DB in Redshift cluster
  • Schema Name  Name of the schema (normally is public)
  • Redshift host – Redshift server hosting the Redshift service
  • Port – Redshift port hosting the Redshift service
  • Username  Redshift user created in Redshift user group
  • Password for JDBC connection
  • Java connector RFC – AWS RFC destination (you may use the same one as for S3 storage)
  • Driver engine  Use REDSHIFT
  • Driver Classname  Classname of the driver used for loading (recent version is com.amazon.redshift.jdbc41.Driver)
  • Driver path  Logical name of the driver file
  • JDBC Login TimeOut in Seconds  Threshold for JDBC timeout
  • Password for JDBC Connection is Hashed  If checked, enter a hashed password. Use instructions in S3/Redshift Storage Setup Password hash generator to generate a hash.
  • Referenced Storage  Defines which S3 storage will be used by Redshift
  • SSL Mode
  • Enable SSL  Checked if SSL authentication should be used

Finish the creation of the storage by confirming (F8). If the SAP system is able to authenticate against AWS Redshift and receives the expected result of the SQL command 'use database', the creation of the storage is considered successful.