(SM-2408) S3/Redshift Storage Setup

S3, Redshift, and Redshift Serverless storages can all be used as a storage layer for SNP Glue™ or SNP OutBoard™ ERP Archiving.

Out of these, Redshift Serverless is offering the best price/performance for most use cases involving structured data.

S3 provides inexpensive storage for landing zones, cold archive storage purposes, and unstructured data.

Redshift with a dedicated cluster is useful for use cases with high, continuous loads.

It is recommended to deploy AWS resources in a way, that the development, test, and production environments are isolated.  This means that when you archive or replicate data to S3, there should be one S3 bucket per SAP system. With Redshift deployment, a typical scenario would be one cluster with separate databases for development and quality environments, and one cluster for the production SAP environment.

While the setup itself is simple, from experience we see that the setup usually takes around two weeks since multiple teams need to be involved to fulfill the prerequisites (SAP Basis, Network, AWS).

The person responsible for the setup should have general knowledge of AWS S3 and AWS Redshift, SAP basis, and networking basics. You will need to deploy resources on AWS and create access credentials, import our transports to the SAP system, install JRE and JDBC drivers on SAP application servers, and make sure that SAP and AWS environments can communicate. 

General Prerequisites

SAP NetWeaver release

Storage management requires SAP NW 7.01 SP15 or higher.

Open Ports

In a controlled network environment, it is common to have firewall rules in place. To enable communication between SAP systems and AWS, outbound communication from the SAP system to the following ports on the AWS side needs to be allowed:

Port

Type

AWS service

Port

Type

AWS service

5439

tcp

Redshift

80/443

http/https

S3

Example of a simple telnet connectivity test:

sapserver01:/ # telnet s3.eu-central-1.amazonaws.com 443 Trying 3.5.139.101... Connected to s3.eu-central-1.amazonaws.com. Escape character is '^]'. ^] telnet> q Connection closed.

Storage Management allows encrypted communication through the public internet with S3 and Redshift, but for production deployment, it is recommended to have some kind of secure connectivity in place (VPN).

Refer to AWS documentation for more details.

Java connector

Java connector is a critical middle-ware component used for both S3 and Redshift storage. Follow the steps from the chapter Java Connector Setup to set it up before you continue.

S3

This chapter describes the steps to establish a connection to S3 storage.

S3 storage is usually used in the following scenarios:

  • Landing zone for raw SAP data from SNP Glue™

  • Archiving storage for SNP OutBoard™ Data Tiering (SARA archives or tabular data when used as transparent binary storage)

  • Intermediate storage used in combination with AWS Redshift

S3 Prerequisites

S3 bucket

You must identify a suitable S3 bucket on AWS or manually create a new one in your AWS subscription based on your requirements. We recommend keeping public access disabled and enabling server-side encryption for added security. Storage management supports both SSE-S3 and SSE-KMS server-side data encryption options.

Note down your S3 region, bucket name, and KMS key ID (optional) as this information will be required during storage setup.

If you use Customer Managed Key, make sure to update the Key Policy to allow key use for the user/role (Principal) accessing the S3 bucket.

AWS User for programmatic access

We recommend creating a distinct user for every SAP system connected to the AWS services to isolate each system's data. Refer to AWS documentation for best security practices. 

A credentials pair needs to be generated (access_key_idsecret_key_id). Make sure to rotate access keys regularly.

These credentials will be used in Storage Management for read/write access to the S3 bucket. If Redshift is used, it will also be used to load/unload data from the Redshift cluster.

 

Root user

Never use the root account to provide access to your AWS subscription. Instead, create separate technical users for programmatic access with minimal authorizations. In the case of AssumeRole, the root account won't work, and the root account can not assume any role (restriction from Amazon AWS).

S3 Policy

Assign an appropriate policy to the technical user, so it can access the S3 bucket. Follow the policy of least privilege. 

For a standard bucket without custom encryption keys, the following policy is sufficient.

Basic bucket policy
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::MY-ACCOUNT-ID:user/MY-USER" }, "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:ListMultipartUploadParts", "s3:AbortMultipartUpload" ], "Resource": [ "arn:aws:s3:::MY-BUCKET/*", "arn:aws:s3:::MY-BUCKET" ] } ] }

S3 Storage in Storage Management

Storage Management facilitates transparent communication with different types of storages.

Create S3 storage through the transaction:

/DVD/SM_SETUP

Entries explained:

  • Storage ID: Name of the storage

  • Storage Type: Choose AWS_S3 for S3

  • Description: Extended description of the storage for easier identification

  • AWS Bucket name: Name of the existing bucket in S3

  • AWS Region: Region where the bucket exists (the recommendation is that the Redshift cluster also exists in the same region)

  • Path Prefix: Path to the landing area within the AWS bucket

  • Custom endpoint: Optional parameter to specify the S3 VPC endpoint URL, for example: https://sample-bucket.bucket.vpce-0a2509460a648e95d-x8m26cz5.s3.eu-central-1.vpce.amazonaws.com

  • AWS Credentials = None (option to choose for having Access key or None credential type): Special case, when authentication against AWS services is performed by default credentials provider chain. So basically when the user sets this option to None, the Java Connector and its Amazon SDK are searching for AWS credentials on the hosting server. In the case of the scenario Assume Role = Assume Role, these credentials are needed to authenticate against the AWS Security Token Service.

  • AWS Credentials = Access key (option to choose for having Access key or None credential type)

    • AWS Access Key: Security information access_key_id - this option is required when AWS Credentials is set to Access key

    • AWS Secret Key: Security information secret_key_id - this option is required when AWS Credentials is set to Access key

  • Assume Role = None (option to choose from having Assume Role or None assume role type): Not using any external AWS role when accessing Amazon resources, the authenticated user uses its policies and attached permissions when accessing Amazon resource

  • Assume Role = Assume Role (option to choose from having Assume Role or None assume role type):

    • The special case is when an authenticated user tries to lease/assume an Amazon role and gets that role temporary credentials and user temporary credentials when accessing Amazon resources. The attached permissions to the assumed role are used to evaluate access rights on Amazon resources instead of users' permissions. More info can be found here.

    • RoleARN: Name of the assumed role in ARN format (example: arn:aws:iam::683735966288:role/dvd_s3_read_role_to_be_assumed) - this option is required when Assume Role is not set to None

    • ExternalID: External ID used to identify against the assumed role, the user has to set proper external ID to be able to successfully get temporary credentials from STS AssumeRole API, more info - this option is required when Assume Role is not set to None

    • RoleSessionName: Identifier for the assumed role session, the role session name to uniquely identify a session when the same role is assumed by different principals or for different reasons, can be used for logging on Amazon - this option is required when Assume Role is not set to None

    • AWS STS endpoint: Optional – URL of AWS STS endpoint, when specified regional AWS STS endpoint is used instead of the global endpoint - more info

    • AWS STS region: Mandatory if the AWS STS endpoint is specified, Region related to the specified endpoint 

  • Java connector RFC: TCP/IP RFC destination for communication with Java connector

  • Java call Repeat: Number of times failed calls should be retried

  • Repeat delay: Delay between retried calls

  • AWS KMS KeyID: Key ID of the key used to encrypt data on S3 (optional)

  • Compute hash: Calculate the hash of data

Complete the creation of the storage by confirming (F8).

Redshift

AWS Redshift is ideal as an archive for frequently accessed tabular data for SNP OutBoard™ Data Tiering or as analytics storage for SNP Glue™.

Redshift prerequisites

S3 storage

Redshift storage requires existing S3 storage to work. Make sure that you finished the steps in the S3 section and that you have working storage ready. 

Cluster and database

You must create a Redshift cluster, which will host your Redshift database. The sizing highly depends on the use case and the amount of data that will be stored there, so use the Help me choose option on the cluster creation page to properly size the cluster.

It is highly recommended that it shares a region with your S3 bucket.

Redshift schema and database user

Create a schema and a database user that Storage Management will use.

There are two supported authentication options for Redshift:

  1. Using Redshift user and password

  2. Using IAM credentials

Using IAM is recommended for production workloads.

In this example, the SID of the SAP system is DVQ and contains recommended naming conventions. Adjust the SQL statements to fit your environment. 

--create user and schema create user datavard_dvq password 'my-difficult-password'; create schema sapdvq; alter schema sapdvq owner to datavard_dvq; -- assign required permissions to system tables grant select on pg_catalog.SVV_TABLE_INFO to datavard_dvq; grant select on pg_catalog.PG_TABLE_DEF to datavard_dvq;

For accessing Redshift using IAM, the credentials used need to have GetClusterCredentials permission, besides standard Redshift authorizations.

JDBC Drivers

JDBC protocol is used to connect to AWS Redshift. AWS Redshift JDBC driver must be manually stored on the operating system and be accessible to the connector.

It is necessary to use the path which is available on all SAP application servers - /sapmnt filesystem is ideal for this purpose. Make sure that <sid>adm:sapsys is the owner of the directory and its contents.

$ ls -ld /sapmnt/DVQ/global/security/dvd_conn/*

drwxr-xr-x 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/security/dvd_conn/redshift

$ ls -l /sapmnt/DVQ/global/security/dvd_conn/aws

drwxr-xr-x 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/security/dvd_conn/redshift/RedshiftJDBC41-no-awssdk-1.2.16.1027.jar

Redshift storage in Storage Management

The AWS Redshift storage is created in a similar way to the process of setting up the S3 storage.

Open transaction /DVD/SM_SETUP > Create > Enter Storage ID and Storage Type (Redshift).

       

Entries explained:

  • Storage ID: Name of the storage

  • Storage Type: Choose REDSHIFT storage type

  • Description: Extended description of the storage for easier identification

  • Referenced Storage: Defines which S3 storage will be used by Redshift

  • Java connector RFC: RFC connection to Java connector

  • Redshift host: Redshift server hosting the Redshift service

  • Port: Port number on which Redshift service is accessible

  • Database name: Name of DB in Redshift cluster

  • Database schema: Name of the schema (usually public)

  • Enable update: If checked, delta loads will be merged into existing data

  • Use extended escaping: If checked, exotic newline characters in data are escaped

  • Driver engine: Use Amazon Redshift

  • Driver Classname: Classname of the driver used for loading, value com.amazon.redshift.jdbc.Driver can be commonly used. Classes contained in the driver file can be checked with command:

  • Driver path: Logical name of the driver directory

  • Connection pool size: Number of connections that can be kept open in the pool, reducing resource-expensive establishment of JDBC connections.

  • Authentication method:

    • Username and password:

      • Username: Redshift user created in the Redshift user group

      • Password: Password for specified username

    • IAM Access and Secret keys:

      • Username: Database user to be used. If it does not exist and AutoCreate is enabled, it does not need to exist prior.

      • IAM Access key: Access key for the IAM role or for IAM database authentication

      • IAM Secret key: Secret key for the IAM role or for IAM database authentication

    • IAM profile:

      • Username: Database user to be used. If it does not exist and AutoCreate is enabled, it does not need to exist prior.

      • IAM Access key: Name of the profile to be used. If left initial, the default profile is used.

  • Login timeout (seconds): Threshold for JDBC timeout

  • Enable SSL: Checked if SSL authentication should be used

  • SSL Mode: There are two options for SSL mode:
    verify-ca (default option, verifies that the certificate comes from a trusted CA)
    verify-full (both CA and hostname listed in the certificate are verified)

Finish the creation of the storage by confirming (F8). If the SAP system can authenticate against AWS Redshift and receives the expected result of the SQL command use database, the creation of the storage is considered successful.

Enabling Debug logging for AWS SDK

As of JCO 235, you can enable detailed logging for the S3 connector, which exposes SDK internals in the Glue Java log. This can be useful for troubleshooting purposes but shouldn't be used extensively as it's very verbose.

To enable the detailed logging:

  1. Start the Java connector

  2. Hit checkbox Avoid JCO config files creation in t-code /DVD/JCO_MNG -> Advanced

  3. Add AWS loggers config to log4j.xml <Loggers> section

  4. Restart JCO for changes to take effect.

    Example config

     

To clean up after debugging, you can Uncheck the Avoid JCO config files creation and Restart the JCO