(SM-2302) Hadoop Storage Setup Old
Prerequisites
Open ports
The following ports need to be open from the SAP system (with OTB) towards the Hadoop cluster:
Port | Hadoop service |
---|---|
10000 | Hive |
14000 | HttpFS |
21050 | Impala |
Hive parameters
There are two configuration parameters of the Hive service which need to be properly configured in Hive Service Advanced Configuration Snippet (Safety Valve) for Hive-site.xml
hive.exec.dynamic.partition = true hive.exec.dynamic.partition.mode = nonstrict |
Example:
Hadoop user
SNP recommends creating distinct users for every SAP system connected to the Hadoop cluster in order to isolate each system's data.
Usually, there is a central repository for Hadoop users (LDAP/AD) but you can also create the user locally.
Each of these users needs to have its own dedicated user group.
If Hadoop Sentry is used: User groups will be used for the definition of Sentry access rules.
The recommended user names are mirroring SAP's guidelines for user names: <sid>adm – <sid>hdp.
Create the user's Kerberos principal in the form of <sid>hdp@<KERBEROS_REALM>.
The user's home directory on HDFS has to be manually created with appropriate permissions:
$ hdfs dfs -ls -d /user/dvqhdp
-rwxrwxr-x 3 dvqhdp supergroup 0 2017-02-24 11:05 /user/dvqhdp
Create a distinct user for every SAP system connecting to the Hadoop cluster. Usually, there is a central repository for Hadoop users (LDAP/AD) but you can also create the user locally.
Important thing is that the user has to be defined identically on every Hadoop cluster node.
If Kerberos is used: Create a Kerberos principal for the user. You need to run kadmin.local on the host where Kerberos DB is running:
Hive database
SNP recommends creating a dedicated database (schema) in Hive for each SAP system. The recommended database name is sap<sid> (sapdvq).
If Hadoop Sentry is used: Two Sentry rules have to be created, which will enable all actions of <sid>hdp user on sap<sid> database and his home directory in HDFS.
Example: If HDFS ACL synchronization with Sentry permissions is enabled, the user's directory has to be added to the Sentry Synchronization Path Prefixes parameter in the HDFS service configuration.
More on HDFS ACL synchronization topic can be found at https://www.cloudera.com/documentation/enterprise/latest/topics/sg_hdfs_sentry_sync.html
SAP system copy
To ensure correct functionality of SNP Outboard™ and/or SNP Glue™ products with Hadoop after the SAP system copy both Hive metastore and HDFS data (hive and user folder) needs to be copied to the new environment. Storage then needs to be configured to point to the copied data.
OS prerequisites (underneath the SAP system)
Java version
Java 1.7 or higher needs to be installed and available to <sid>adm user.
Another option is to copy the JRE (with version 1.7 or higher) to the folder which is available to the <sid>adm user.
We found out that when connecting to a Kerberized cluster, the basic version does not work well (be it either 1.7.0 or 1.8.0) - there is a problem with the authentication method.
Therefore to connect to the Kerberized cluster, it is necessary to use a patched java version (e.g. 1.8.0_102).
SAP Java Connector Library
SAP Java Connector 3.0 library needs to be accessible. It can be downloaded from the SAP marketplace. The SAR file contains two files:
libsapjco3.so and sapjco3.jar. libsapjco3.so needs to be present in LD_LIBRARY_PATH environment variable of <sid>adm user.
OS directories
Create directories on each SAP application server to store Storage Management related configuration and log files. <sid>adm user needs permission to access those folders.
/usr/sap/<sid>/dvd_conn
Kerberos config file
Create Kerberos config file to be used by SAP backend user (can be copied from Hadoop cluster host /etc/krb5.conf):
Example:
[libdefaults] default_realm = HADOOP.LOCAL dns_lookup_kdc = false dns_lookup_realm = false ticket_lifetime = 86400 renew_lifetime = 604800 forwardable = true default_tgs_enctypes = rc4-hmac default_tkt_enctypes = rc4-hmac permitted_enctypes = rc4-hmac udp_preference_limit = 1 kdc_timeout = 3000 [realms] HADOOP.LOCAL = { kdc = skbtscck21.hadoop.local admin_server = skbtscck21.hadoop.local }
SAP prerequisites
Kerberos cookie encoding
If the Hadoop cluster is kerberized, cookie encoding needs to be disabled in every application server's profile (instance profile):
ict/disable_cookie_urlencoding = 1
RFC user
A dedicated user has to be created. User type: Communications Data is sufficient. This user has to have a role with all authorizations for RFC.
Configuration
RFCs
HttpFS RFC
Create HttpFS RFC destination:
Transaction SM59 > [Create]
[Save]
RFC Destination: Name is up to the customer.
Connection type: G (HTTP Connection to External Service).
Target host: Hadoop node running HttpFS service Service No. - 14000 as default or any other port HttpFS service is listening at.
Path Prefix: /webhdfs/v1 the part is fixed, following part of the path (in this case /user/<sid>hdp) is a path within HDFS where data uploaded from the configured SAP system will be stored.
Hive RFC
Create TCP/IP RFC destination:
[Save]
RFC Destination: This RFC will always communicate with the Java connector and as such its name can remain the default HADOOP_HIVE_CONN.
Connection type: T (TCP/IP Connection).
(o) Registered Server Program.
Program ID: HIVE_CONN.
Authentication RFC
IF KERBEROS IS USED: Create TCP/IP RFC destination:
[Save]
RFC Destination: This RFC will supply Kerberos credentials and as such its name can remain the default HADOOP_AUTH_CONN.
Connection type: T (TCP/IP Connection).
(o) Registered Server Program.
Program ID: AUTH_CONN.
Storage Management setup
Kerberos logical file definition
Two files have to to be defined in the SAP system - Kerberos configuration file, SAP backend user's keytab.
IF KERBEROS IS USED: Three files - Kerberos configuration file, SAP backend user's keytab, and Cloudera driver config need to be defined in the SAP system.
First, define the logical path, and where they are located.
Transaction: FILE > Logical File Path Definition > [New Entries]
Logical file path: ZHADOOP_SECURITY
[Save] (create/assign own workbench request if necessary)
Select the newly created logical path
and double click Assignment of Physical Paths to Logical Path
[New Entries]
Syntax group: UNIX
[Save]
NOTE: <sid>adm user needs to have permission to write and read into this folder.
Now that the path is defined, select Logical File Name Definition, Cross-Client > [New Entries] and define 2 (3 in case of using Cloudera drivers) logical files:
One for the keytab file, one for the Kerberos config file, and optional for the Cloudera driver config file.
Java connector configuration
Several configuration tables are imported to the system with SNP transports.
/DVD/HDP_CUS_C table is supplying credentials for authentication towards the Hadoop HttpFS service.
If the authentication is not enforced (no Kerberos), run:
SM30 > Table/View: /DVD/HDP_CUS_C > [Maintain] > [New Entries]
IF KERBEROS IS USED:
/DVD/JAVA_CONFIG table stores parameters of Java connector.
This table can be edited via this report: /DVD/SM_HIVE_EDIT_CONFIG
Fill in this table for version 34.
IF KERBEROS IS USED: Fill in this report for version 32 too.
Parameter name | Parameter value |
---|---|
PASSWORD | Password for the RFC user created in the Prerequisites chapter. |
USERNAME | Username for the RFC user. |
PROG_ID | Program ID of the RFC destination created in the Configuration chapter. |
REP_DEST | ABAP_AS_WITH_POOL |
MAX_RAM_USED | For authorization service, we recommend 50M (50MB), and for Hive connection at least 1G (1GB) More if possible. |
WORK_THREAD_MIN | 5 |
WORK_THREAD_MAX | 10 |
CONN_COUNT | 10 |
PEAK_LIMIT | 10 |
CLIENT | Client of the RFC user |
CONFIG_AS_PATH | Path to the config file (the path we specified in the Prerequisites chapter plus filename). "<path>[Hdp/Hive]_java_to_sap.cfg" |
LOG_FILE | Path to a log file (the path we specified in the Prerequisites chapter plus filename). "<path>/[hdp/Hive]_auth.log" |
CONFIG_PATH | Path to the config file (the path we specified in the Prerequisites chapter plus filename). "<path>/[hdp/hive]_auth_conn.cfg" |
JAR_PATH | Path to .jar file of the service (the path we specified in the Prerequisites chapter plus the filename). "<path>/DVD[Http/Hive]Conn.jar" |
JAVA_EXE | Path to Java binary executable. |
Create storage
Create Hive storage in Storage Management (Hive/Impala transparent storage v2).
/DVD/OUTBOARD > [Settings] > [Storage Management] > [Edit mode] > [New Storage]
IF KERBEROS IS USED:
[Confirm]
The expected system response is that the newly created storage was checked successfully.
Impala Storage
Similarly to Hive storage, create Impala storage
/DVD/OUTBOARD > [Settings] > [Storage Management] > [Edit mode] > [New Storage]
The expected system response is that the newly created storage was checked successfully.