(SM-2105) S3/Redshift Storage Setup
S3 and Redshift storages can both be used as a storage layer for Datavard Glue or Outboard DataTiering. S3 provides inexpensive storage for landing zone or cold archive storage purposes. Redshift comes at a higher cost but can be used as a directly query-able analytical storage, or a hot archive suitable for more aggressive SAP archiving strategies.
It is recommended to deploy AWS resources in a way, that the development, test, and production environments are isolated. This means that when you archive or replicate data to S3, there should be 1 S3 bucket per SAP system. With Redshift deployment, a typical scenario would be 1 cluster with separate databases for development and quality environments, and 1 cluster for the production SAP environment.
While the setup itself is simple, from experience we see that the setup usually takes around 2 weeks since multiple teams need to be involved to fulfill the prerequisites (SAP Basis, Network, AWS).
The person responsible for the setup should have general knowledge of AWS S3 and AWS Redshift, SAP basis, and networking basics. You will need to deploy resources on AWS and create access credentials, import Datavard transports to the SAP system, install JRE and JDBC drivers on SAP application servers, and make sure that SAP and AWS environments can communicate.
General Prerequisites
SAP NetWeaver release
Datavard storage management requires SAP NW 7.01 SP15 or higher.
Open Ports
In a controlled network environment, it is common to have firewall rules in place. To enable communication between SAP systems and AWS, an outbound communication from the SAP system to following ports on AWS side needs to be allowed:
Port | Type | AWS service |
---|---|---|
5439 | tcp | Redshift |
80/443 | http/https | S3 |
Datavard Storage Management allows encrypted communication through the public internet with S3 and Redshift, but for production deployment, it is recommended to have some kind of secure connectivity in place (VPN).
Please refer to AWS documentation for more details.
Java connector
Java connector is a critical middle-ware component used for both S3 and Redshift storage. Please follow the steps in the article Java Connector Setup to set it up before you continue.
Make sure that your Java connector includes libraries with AWS SDK.
S3
This chapter describes steps to establish a connection to S3 storage.
S3 storage is usually used in the following scenarios:
- Landing zone for raw SAP data from Datavard Glue
- Archiving storage for Outboard DataTiering (SARA archives or tabular data when used as transparent binary storage)
- Intermediate storage used in combination with AWS Redshift
S3 prerequisites
S3 bucket
You must identify a suitable S3 bucket on AWS or manually create a new one in your AWS subscription based on your requirements. We recommend keeping the public access disabled and enable server-side encryption for added security. Datavard Storage management supports both SSE-S3 and SSE-KMS server-side data encryption options.
Note down your S3 region, bucket name and KMS key ID (optional) as this information will be required during storage setup.
AWS User for programmatic access
We recommend creating a distinct user for every SAP system connected to the AWS services in order to isolate each system's data. Please refer to AWS documentation for best security practices.
You must also generate a credentials pair ("access_key_id", "secret_key_id"). Make sure to regularly rotate access keys.
These credentials will be used in Storage Management for read/write access to the S3 bucket. If Redshift is used, it will also be used to load/unload data from the Redshift cluster.
Root user
Never use root account to provide access to your AWS subscription. Instead, create separate technical users for programmatic access with minimal authorizations.
S3 Policy
Assign an appropriate policy to the technical user, so he can access the S3 bucket. Please follow the policy of least privilege.
For a standard bucket without custom encryption keys, the following policy is sufficient.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::MY-ACCOUNT-ID:user/MY-USER" }, "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:ListMultipartUploadParts", "s3:AbortMultipartUpload" ], "Resource": [ "arn:aws:s3:::MY-BUCKET/*", "arn:aws:s3:::MY-BUCKET" ] } ] }
S3 storage in Storage Management
Datavard Storage Management facilitates transparent communication with different types of storages.
Create S3 storage through the transaction:
/DVD/SM_SETUP
Entries explained:
- Storage ID – name of the storage
- Storage Type – choose AWS_S3 for S3
- Description – extended description of the storage for easier identification
- AWS Bucket name – name of the existing bucket in S3
- AWS Region – region where the bucket exists (recommendation is that also Redshift cluster exists in the same region)
- AWS Access Key – security information "access_key_id"
- AWS Secret Key – security information "secret_key_id"
- Java connector RFC – TCP/IP RFC destination for communication with Datavard Java connector
- Java call Repeat - number of times failed call should be retried
- Repeat delay - delay between retried calls
Avoid use of TMP files – performs data movement from SAP to Java in memory (check by default)
AWS KMS KeyID - Key ID of the key used to encrypt data on S3 (optional)
Path for TMP files must be visible for the instance of the java connector. In case your SAP system is a cluster consisting of multiple physical machines, you need to configure NFS (Network File System). Performing this step you'll make sure that all application servers will be writing temporary data into one shared location, which is visible for the Java connector instance. With this configuration, you will be able to perform storage operations on S3 storage regardless of the actual SAP application server. /sapmnt is usually a NFS directory shared among all SAP application servers.
Complete the creation of the storage by confirming (F8).
Redshift
AWS Redshift is ideal as an archive for frequently accessed tabular data for Datavard Outboard DataTiering or as analytics storage for Datavard Glue.
Redshift prerequisites
S3 storage
Redshift storage requires existing S3 storage to work. Make sure that you finished the steps in the S3 section and you have working storage ready.
Cluster and database
You must create a Redshift cluster, that will host your Redshift database. The sizing highly depends on the use case and amount of data that will be stored there, so please use the "Help me choose" option in the cluster creation page to properly size the cluster.
It is highly recommended that it shares a region with your S3 bucket.
Redshift schema and database user
Create a schema and a database user that will be used by Datavard Storage Management.
In this example, the SID of the SAP system is DVQ and contains recommended naming conventions. Adjust the SQL statements to fit your environment.
--create user and schema create user datavard_dvq password 'my-difficult-password'; create schema sapdvq; alter schema sapdvq owner to datavard_dvq; -- assign required permissions to system tables grant select on pg_catalog.SVV_TABLE_INFO to datavard_dvq; grant select on pg_catalog.PG_TABLE_DEF to datavard_dvq;
JDBC Drivers
JDBC protocol is used to connect to AWS Redshift. AWS Redshift JDBC driver (RedshiftJDBC42-no-awssdk-1.2.16.1027.jar) must be manually stored on the operating system and be accessible to the Datavard connector.
It is recommended to use the default path as in the example below to utilize the predefined logical paths in SAP. Make sure that <sid>adm:sapsys is the owner of the directory dvd_conn and all its contents.
$ ls -ld /sapmnt/DVQ/global/security/dvd_conn/* drwxr-xr-x 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/security/dvd_conn/redshift $ ls -l /sapmnt/DVQ/global/security/dvd_conn/aws drwxr-xr-x 2 dvqadm sapsys 4096 --- /sapmnt/DVQ/global/security/dvd_conn/redshift/RedshiftJDBC41-no-awssdk-1.2.16.1027.jar |
Redshift storage in Storage Management
The AWS Redshift storage is created in a similar way as the process of setting up the S3 storage.
Open transaction /DVD/SM_SETUP > Create > Enter Storage ID and Storage Type "Redshift"
Entries explained:
- Storage ID – Name of the storage
- Storage Type – Choose REDSHIFT storage type
- Description – Extended description of the storage for easier identification
- Referenced Storage – Defines which S3 storage will be used by Redshift
- Java connector RFC – RFC connection to Datavard Java connector
- Redshift host – Redshift server hosting the Redshift service
- Port – port number on which Redshift service is accessible
- Database name – Name of DB in Redshift cluster
- Database schema – Name of the schema (normally is public)
- Enable update - If checked, delta loads will be merged into existing data
- Use extended escaping - If checked, exotic newline characters in data are escaped
- Driver engine – Use Amazon Redshift
- Driver Classname – Classname of the driver used for loading (the recent version is com.amazon.redshift.jdbc41.Driver)
- Driver path – Logical name of the driver directory
- Username – Redshift user created in the Redshift user group
- Password – Type in the password in the lower line and use the [Hash] button
- Password hashed - check if you stored the password in encrypted form
- Login timeout (seconds) – Threshold for JDBC timeout
- Enable SSL – Checked if SSL authentication should be used
- SSL Mode – There are two options for SSL mode:
verify-ca (default option, verifies that the certificate comes from trusted CA)
verify-full (both CA and hostname listed in the certificate are verified)
Finish the creation of the storage by confirming (F8). If the SAP system is able to authenticate against AWS Redshift and receives the expected result of the SQL command 'use database', the creation of the storage is considered successful.