(SM-2405) S3/Redshift Storage Setup
S3, Redshift, and Redshift Serverless storages can all be used as a storage layer for SNP Glue™ or SNP OutBoard™ ERP Archiving.
Out of these, Redshift Serverless is offering the best price/performance for most use cases involving structured data.
S3 provides inexpensive storage for landing zones, cold archive storage purposes, and unstructured data.
Redshift with a dedicated cluster is useful for use cases with high, continuous loads.
It is recommended to deploy AWS resources in a way, that the development, test, and production environments are isolated. This means that when you archive or replicate data to S3, there should be one S3 bucket per SAP system. With Redshift deployment, a typical scenario would be one cluster with separate databases for development and quality environments, and one cluster for the production SAP environment.
While the setup itself is simple, from experience we see that the setup usually takes around two weeks since multiple teams need to be involved to fulfill the prerequisites (SAP Basis, Network, AWS).
The person responsible for the setup should have general knowledge of AWS S3 and AWS Redshift, SAP basis, and networking basics. You will need to deploy resources on AWS and create access credentials, import our transports to the SAP system, install JRE and JDBC drivers on SAP application servers, and make sure that SAP and AWS environments can communicate.
General Prerequisites
SAP NetWeaver release
Storage management requires SAP NW 7.01 SP15 or higher.
Open Ports
In a controlled network environment, it is common to have firewall rules in place. To enable communication between SAP systems and AWS, outbound communication from the SAP system to the following ports on the AWS side needs to be allowed:
Port | Type | AWS service |
---|---|---|
5439 | tcp | Redshift |
80/443 | http/https | S3 |
Example of a simple telnet connectivity test:
sapserver01:/ # telnet s3.eu-central-1.amazonaws.com 443
Trying 3.5.139.101...
Connected to s3.eu-central-1.amazonaws.com.
Escape character is '^]'.
^]
telnet> q
Connection closed.
Storage Management allows encrypted communication through the public internet with S3 and Redshift, but for production deployment, it is recommended to have some kind of secure connectivity in place (VPN).
Refer to AWS documentation for more details.
Java connector
Java connector is a critical middle-ware component used for both S3 and Redshift storage. Follow the steps from the chapter Java Connector Setup to set it up before you continue.
S3
This chapter describes the steps to establish a connection to S3 storage.
S3 storage is usually used in the following scenarios:
Landing zone for raw SAP data from SNP Glue™
Archiving storage for SNP OutBoard™ Data Tiering (SARA archives or tabular data when used as transparent binary storage)
Intermediate storage used in combination with AWS Redshift
S3 Prerequisites
S3 bucket
You must identify a suitable S3 bucket on AWS or manually create a new one in your AWS subscription based on your requirements. We recommend keeping public access disabled and enabling server-side encryption for added security. Storage management supports both SSE-S3 and SSE-KMS server-side data encryption options.
Note down your S3 region, bucket name, and KMS key ID (optional) as this information will be required during storage setup.
If you use Customer Managed Key, make sure to update the Key Policy to allow key use for the user/role (Principal) accessing the S3 bucket.
AWS User for programmatic access
We recommend creating a distinct user for every SAP system connected to the AWS services to isolate each system's data. Refer to AWS documentation for best security practices.
A credentials pair needs to be generated (access_key_id, secret_key_id). Make sure to rotate access keys regularly.
These credentials will be used in Storage Management for read/write access to the S3 bucket. If Redshift is used, it will also be used to load/unload data from the Redshift cluster.
Root user
Never use the root account to provide access to your AWS subscription. Instead, create separate technical users for programmatic access with minimal authorizations. In the case of AssumeRole, the root account won't work, and the root account can not assume any role (restriction from Amazon AWS).
S3 Policy
Assign an appropriate policy to the technical user, so it can access the S3 bucket. Follow the policy of least privilege.
For a standard bucket without custom encryption keys, the following policy is sufficient.
Basic bucket policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::MY-ACCOUNT-ID:user/MY-USER"
},
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload"
],
"Resource": [
"arn:aws:s3:::MY-BUCKET/*",
"arn:aws:s3:::MY-BUCKET"
]
}
]
}
S3 Storage in Storage Management
Storage Management facilitates transparent communication with different types of storages.
Create S3 storage through the transaction:
/DVD/SM_SETUP
Entries explained:
Storage ID: Name of the storage
Storage Type: Choose AWS_S3 for S3
Description: Extended description of the storage for easier identification
AWS Bucket name: Name of the existing bucket in S3
AWS Region: Region where the bucket exists (the recommendation is that the Redshift cluster also exists in the same region)
Path Prefix: Path to the landing area within the AWS bucket
Custom endpoint: Optional parameter to specify the S3 VPC endpoint URL, for example: https://sample-bucket.bucket.vpce-0a2509460a648e95d-x8m26cz5.s3.eu-central-1.vpce.amazonaws.com
AWS Credentials = None (option to choose for having Access key or None credential type): Special case, when authentication against AWS services is performed by default credentials provider chain. So basically when the user sets this option to None, the Java Connector and its Amazon SDK are searching for AWS credentials on the hosting server. In the case of the scenario Assume Role = Assume Role, these credentials are needed to authenticate against the AWS Security Token Service.
AWS Credentials = Access key (option to choose for having Access key or None credential type)
AWS Access Key: Security information access_key_id - this option is required when AWS Credentials is set to Access key
AWS Secret Key: Security information secret_key_id - this option is required when AWS Credentials is set to Access key
Assume Role = None (option to choose from having Assume Role or None assume role type): Not using any external AWS role when accessing Amazon resources, the authenticated user uses its policies and attached permissions when accessing Amazon resource
Assume Role = Assume Role (option to choose from having Assume Role or None assume role type):
The special case is when an authenticated user tries to lease/assume an Amazon role and gets that role temporary credentials and user temporary credentials when accessing Amazon resources. The attached permissions to the assumed role are used to evaluate access rights on Amazon resources instead of users' permissions. More info can be found here.
RoleARN: Name of the assumed role in ARN format (example: arn:aws:iam::683735966288:role/dvd_s3_read_role_to_be_assumed) - this option is required when Assume Role is not set to None
ExternalID: External ID used to identify against the assumed role, the user has to set proper external ID to be able to successfully get temporary credentials from STS AssumeRole API, more info - this option is required when Assume Role is not set to None
RoleSessionName: Identifier for the assumed role session, the role session name to uniquely identify a session when the same role is assumed by different principals or for different reasons, can be used for logging on Amazon - this option is required when Assume Role is not set to None
AWS STS endpoint: Optional – URL of AWS STS endpoint, when specified regional AWS STS endpoint is used instead of the global endpoint - more info
AWS STS region: Mandatory if the AWS STS endpoint is specified, Region related to the specified endpoint
Java connector RFC: TCP/IP RFC destination for communication with Java connector
Java call Repeat: Number of times failed calls should be retried
Repeat delay: Delay between retried calls
AWS KMS KeyID: Key ID of the key used to encrypt data on S3 (optional)
Compute hash: Calculate the hash of data
Complete the creation of the storage by confirming (F8).
Redshift
AWS Redshift is ideal as an archive for frequently accessed tabular data for SNP OutBoard™ Data Tiering or as analytics storage for SNP Glue™.
Redshift prerequisites
S3 storage
Redshift storage requires existing S3 storage to work. Make sure that you finished the steps in the S3 section and that you have working storage ready.
Cluster and database
You must create a Redshift cluster, which will host your Redshift database. The sizing highly depends on the use case and the amount of data that will be stored there, so use the Help me choose option on the cluster creation page to properly size the cluster.
It is highly recommended that it shares a region with your S3 bucket.
Redshift schema and database user
Create a schema and a database user that Storage Management will use.
There are two supported authentication options for Redshift:
Using Redshift user and password
Using IAM credentials
Using IAM is recommended for production workloads.
In this example, the SID of the SAP system is DVQ and contains recommended naming conventions. Adjust the SQL statements to fit your environment.
--create user and schema
create user datavard_dvq password 'my-difficult-password';
create schema sapdvq;
alter schema sapdvq owner to datavard_dvq;
-- assign required permissions to system tables
grant select on pg_catalog.SVV_TABLE_INFO to datavard_dvq;
grant select on pg_catalog.PG_TABLE_DEF to datavard_dvq;
For accessing Redshift using IAM, the credentials used need to have GetClusterCredentials
permission, besides standard Redshift authorizations.
JDBC Drivers
JDBC protocol is used to connect to AWS Redshift. AWS Redshift JDBC driver must be manually stored on the operating system and be accessible to the connector.
It is necessary to use the path which is available on all SAP application servers - /sapmnt
filesystem is ideal for this purpose. Make sure that <sid>adm:sapsys is the owner of the directory and its contents.
|
Redshift storage in Storage Management
The AWS Redshift storage is created in a similar way to the process of setting up the S3 storage.
Open transaction /DVD/SM_SETUP > Create > Enter Storage ID and Storage Type (Redshift).
Entries explained:
Storage ID: Name of the storage
Storage Type: Choose REDSHIFT storage type
Description: Extended description of the storage for easier identification
Referenced Storage: Defines which S3 storage will be used by Redshift
Java connector RFC: RFC connection to Java connector
Redshift host: Redshift server hosting the Redshift service
Port: Port number on which Redshift service is accessible
Database name: Name of DB in Redshift cluster
Database schema: Name of the schema (usually public)
Enable update: If checked, delta loads will be merged into existing data
Use extended escaping: If checked, exotic newline characters in data are escaped
Driver engine: Use Amazon Redshift
Driver Classname: Classname of the driver used for loading, value com.amazon.redshift.jdbc.Driver can be commonly used. Classes contained in the driver file can be checked with command:
unzip -l <driver_file>.jar | grep Driver.class
Driver path: Logical name of the driver directory
Connection pool size: Number of connections that can be kept open in the pool, reducing resource-expensive establishment of JDBC connections.
Authentication method:
Username and password:
Username: Redshift user created in the Redshift user group
Password: Password for specified username
Username: Database user to be used. If it does not exist and AutoCreate is enabled, it does not need to exist prior.
IAM Access key: Access key for the IAM role or for IAM database authentication
IAM Secret key: Secret key for the IAM role or for IAM database authentication
IAM profile:
Username: Database user to be used. If it does not exist and AutoCreate is enabled, it does not need to exist prior.
IAM Access key: Name of the profile to be used. If left initial, the default profile is used.
Login timeout (seconds): Threshold for JDBC timeout
Enable SSL: Checked if SSL authentication should be used
SSL Mode: There are two options for SSL mode:
verify-ca (default option, verifies that the certificate comes from a trusted CA)
verify-full (both CA and hostname listed in the certificate are verified)
Finish the creation of the storage by confirming (F8). If the SAP system can authenticate against AWS Redshift and receives the expected result of the SQL command use database, the creation of the storage is considered successful.
Enabling Debug logging for AWS SDK
As of JCO 235, you can enable detailed logging for the S3 connector, which exposes SDK internals in the Glue Java log. This can be useful for troubleshooting purposes but shouldn't be used extensively as it's very verbose.
To enable the detailed logging:
Start the Java connector
Hit checkbox
Avoid JCO config files creation
in t-code/DVD/JCO_MNG
->Advanced
Add AWS loggers config to
log4j.xml
<Loggers> sectionRestart JCO for changes to take effect.
Example config
<Loggers> ... <!-- some s3 custom logging --> <Logger name="com.amazonaws" level="TRACE" additivity="false" > <AppenderRef ref="RollingFile" /> <Logger name="org.apache.http" level="TRACE" additivity="false"> <AppenderRef ref="RollingFile" /> </Loggers>
To clean up after debugging, you can Uncheck the Avoid JCO config files creation
and Restart the JCO