(SM-2108) S3/Redshift Storage Setup

S3 and Redshift storages can both be used as a storage layer for Datavard Glue or Outboard Archiving. S3 provides inexpensive storage for landing zone or cold archive storage purposes. Redshift comes at a higher cost but can be used as a directly query-able analytical storage, or a hot archive suitable for more aggressive SAP archiving strategies. 

It is recommended to deploy AWS resources in a way, that the development, test, and production environments are isolated.  This means that when you archive or replicate data to S3, there should be 1 S3 bucket per SAP system. With Redshift deployment, a typical scenario would be 1 cluster with separate databases for development and quality environments, and 1 cluster for the production SAP environment.

While the setup itself is simple, from experience we see that the setup usually takes around 2 weeks since multiple teams need to be involved to fulfill the prerequisites (SAP Basis, Network, AWS).

The person responsible for the setup should have general knowledge of AWS S3 and AWS Redshift, SAP basis, and networking basics. You will need to deploy resources on AWS and create access credentials, import Datavard transports to the SAP system, install JRE and JDBC drivers on SAP application servers, and make sure that SAP and AWS environments can communicate. 

General Prerequisites

SAP NetWeaver release

Datavard storage management requires SAP NW 7.01 SP15 or higher.

Open Ports

In a controlled network environment, it is common to have firewall rules in place. To enable communication between SAP systems and AWS, an outbound communication from the SAP system to following ports on AWS side needs to be allowed:

PortTypeAWS service
5439tcpRedshift
80/443http/https

S3

Datavard Storage Management allows encrypted communication through the public internet with S3 and Redshift, but for production deployment, it is recommended to have some kind of secure connectivity in place (VPN).

Please refer to AWS documentation for more details.

Java connector

Java connector is a critical middle-ware component used for both S3 and Redshift storage. Please follow the steps in the article Java Connector Setup to set it up before you continue.

Make sure that your Java connector includes libraries with AWS SDK.

S3

This chapter describes steps to establish a connection to S3 storage.

S3 storage is usually used in the following scenarios:

  • Landing zone for raw SAP data from Datavard Glue
  • Archiving storage for Outboard DataTiering (SARA archives or tabular data when used as transparent binary storage)
  • Intermediate storage used in combination with AWS Redshift

S3 prerequisites

S3 bucket

You must identify a suitable S3 bucket on AWS or manually create a new one in your AWS subscription based on your requirements. We recommend keeping the public access disabled and enable server-side encryption for added security. Datavard Storage management supports both SSE-S3 and SSE-KMS server-side data encryption options.

Note down your S3 region, bucket name and KMS key ID (optional) as this information will be required during storage setup.

AWS User for programmatic access

We recommend creating a distinct user for every SAP system connected to the AWS services in order to isolate each system's data. Please refer to AWS documentation for best security practices. 

You must also generate a credentials pair ("access_key_id""secret_key_id"). Make sure to regularly rotate access keys.

These credentials will be used in Storage Management for read/write access to the S3 bucket. If Redshift is used, it will also be used to load/unload data from the Redshift cluster.

Root user

Never use root account to provide access to your AWS subscription. Instead, create separate technical users for programmatic access with minimal authorizations. 

S3 Policy

Assign an appropriate policy to the technical user, so he can access the S3 bucket. Please follow the policy of least privilege. 

For a standard bucket without custom encryption keys, the following policy is sufficient.

Basic bucket policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::MY-ACCOUNT-ID:user/MY-USER"
            },
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::MY-BUCKET/*",
                "arn:aws:s3:::MY-BUCKET"
            ]
        }
    ]
}

S3 storage in Storage Management

Datavard Storage Management facilitates transparent communication with different types of storages.

Create S3 storage through the transaction:

/DVD/SM_SETUP

                                                                                                                                                              

Entries explained:

  • Storage ID – name of the storage
  • Storage Type – choose AWS_S3 for S3
  • Description – extended description of the storage for easier identification
  • AWS Bucket name  name of the existing bucket in S3
  • AWS Region  region where the bucket exists (recommendation is that also Redshift cluster exists in the same region)
  • Path Prefix  path to landing area within AWS bucket
  • AWS Access Key  security information "access_key_id"
  • AWS Secret Key  security information "secret_key_id"
  • Java connector RFC – TCP/IP RFC destination for communication with Datavard Java connector
  • Java call Repeat - number of times failed call should be retried
  • Repeat delay - delay between retried calls
  • AWS KMS KeyID - Key ID of the key used to encrypt data on S3 (optional)

    Path for TMP files must be visible for the instance of the java connector. In case your SAP system is a cluster consisting of multiple physical machines, you need to configure NFS (Network File System). Performing this step you'll make sure that all application servers will be writing temporary data into one shared location, which is visible for the Java connector instance. With this configuration, you will be able to perform storage operations on S3 storage regardless of the actual SAP application server. /sapmnt is usually a NFS directory shared among all SAP application servers.

Complete the creation of the storage by confirming (F8).

Redshift

AWS Redshift is ideal as an archive for frequently accessed tabular data for Datavard Outboard DataTiering or as analytics storage for Datavard Glue.

Redshift prerequisites

S3 storage

Redshift storage requires existing S3 storage to work. Make sure that you finished the steps in the S3 section and you have working storage ready. 

Cluster and database

You must create a Redshift cluster, that will host your Redshift database. The sizing highly depends on the use case and the amount of data that will be stored there, so please use the "Help me choose" option in the cluster creation page to properly size the cluster.

It is highly recommended that it shares a region with your S3 bucket.

Redshift schema and database user

Create a schema and a database user that will be used by Datavard Storage Management.

In this example, the SID of the SAP system is DVQ and contains recommended naming conventions. Adjust the SQL statements to fit your environment. 

--create user and schema
create user datavard_dvq password 'my-difficult-password';
create schema sapdvq;
alter schema sapdvq owner to datavard_dvq; 
-- assign required permissions to system tables
grant select on pg_catalog.SVV_TABLE_INFO to datavard_dvq;
grant select on pg_catalog.PG_TABLE_DEF to datavard_dvq;

JDBC Drivers

JDBC protocol is used to connect to AWS Redshift. AWS Redshift JDBC driver (RedshiftJDBC42-no-awssdk-1.2.16.1027.jar) must be manually stored on the operating system and be accessible to the Datavard connector.

It is recommended to use the default path as in the example below to utilize the predefined logical paths in SAP. Make sure that <sid>adm:sapsys is the owner of the directory dvd_conn and all its contents.

$ ls -ld /sapmnt/DVQ/global/security/dvd_conn/*
drwxr-xr-x 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/security/dvd_conn/redshift


$ ls -l /sapmnt/DVQ/global/security/dvd_conn/aws
drwxr-xr-x 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/security/dvd_conn/redshift/RedshiftJDBC41-no-awssdk-1.2.16.1027.jar

Redshift storage in Storage Management

The AWS Redshift storage is created in a similar way as the process of setting up the S3 storage.

Open transaction /DVD/SM_SETUP > Create > Enter Storage ID and Storage Type "Redshift"

                                                                                                                                                         

Entries explained:

  • Storage ID – Name of the storage
  • Storage Type – Choose REDSHIFT storage type
  • Description – Extended description of the storage for easier identification
  • Referenced Storage  Defines which S3 storage will be used by Redshift
  • Java connector RFC – RFC connection to Datavard Java connector
  • Redshift host – Redshift server hosting the Redshift service
  • Port – port number on which Redshift service is accessible
  • Database name  Name of DB in Redshift cluster
  • Database schema  Name of the schema (normally is public)
  • Enable update - If checked, delta loads will be merged into existing data
  • Use extended escaping - If checked, exotic newline characters in data are escaped
  • Driver engine  Use Amazon Redshift
  • Driver Classname  Classname of the driver used for loading (the recent version is com.amazon.redshift.jdbc41.Driver)
  • Driver path  Logical name of the driver directory
  • Username  Redshift user created in the Redshift user group
  • Password  Type in the password in the lower line and use the [Hash] button
  • Password hashed - check if you stored the password in encrypted form
  • Login timeout  (seconds)  Threshold for JDBC timeout
  • Enable SSL  Checked if SSL authentication should be used
  • SSL Mode  There are two options for SSL mode:
    verify-ca (default option, verifies that the certificate comes from trusted CA)
    verify-full (both CA and hostname listed in the certificate are verified)

Finish the creation of the storage by confirming (F8). If the SAP system is able to authenticate against AWS Redshift and receives the expected result of the SQL command 'use database', the creation of the storage is considered successful.