...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Prerequisites
Open ports
In a controlled network environment, it is common to have firewall rules in place. In order to enable communication of SAP systems with Hadoop, the following port numbers need to be reachable in Hadoop cluster from the SAP system:
...
Proper DNS name translation needs to be configured between SAP and Hadoop for Kerberos communication.
Hive parameters
There are two configuration parameters of the Hive service which need to be properly configured in Hive Service Advanced Configuration Snippet (Safety Valve) for Hive-site.xml
Code Block |
---|
hive.exec.dynamic.partition = true hive.exec.dynamic.partition.mode = nonstrict |
Example:
Hadoop user
Datavard recommends to create distinct users for every SAP system connected to the Hadoop cluster in order to isolate each system's data.
There is usually central repository for hadoop users (LDAP/AD), but you can also create the user locally (on every hadoop cluster node).
Each of these users has to have its own dedicated user group.
If Hadoop Sentry is used: User groups will be used for the definition of Sentry access rules.
...
Code Block |
---|
$ hdfs dfs -ls -d /user/dvqhdp -rwxrwxr-x 3 dvqhdp supergroup 0 --- /user/dvqhdp |
If Kerberos is used: Create Kerberos principal for user. You need to run kadmin.local on host where Kerberos DB is running
Hive database
Datavard recommends to create a dedicated database (schema) in Hive for each SAP system. The recommended database name is sap<sid> (sapdvq).
...
SAP Java Connector library
SAP Java Connector 3.0 library libsapjco3.so needs to be accessible, which is available for download from the SAP marketplace. It needs to be present in the SAP kernel directory or in other directory pointed by LD_LIBRARY_PATH environment variable of <sid>adm user.
Code Block |
---|
$ which libsapjco3.so /usr/sap/DVQ/SYS/exe/uc/linuxx86_64/libsapjco3.so |
OS directories
Hadoop connector uses two directories dedicated to its configuration and log files:
...
The first one (/sapmnt/<SID>/global/security/dvd_conn) is used to store Kerberos and SSL related files and is shared among SAP application servers.
The second will store drivers, configuration and log files of Java connector used by Datavard Glue. Ownership and permissions need to be set appropriately to <sid>adm
Create directories on each SAP application server to store Storage Management related configuration and log files. <sid>adm user needs permissions to access those folders.
JDBC Drivers
JDBC protocol is used to connect to Hadoop services (Hive and Impala). JDBC drivers have to be manually stored on the operating system and accessible to Datavard connector.
...
https://help.sap.com/saphelp_nw73/helpdata/en/e2/16d0427a2440fc8bfc25e786b8e11c/content.htm
CONFIGURATION
When all prerequisites are fulfilled, further configuration is done from within the SAP system.
RFC Destinations
There are three RFC connections which need to be created via transaction SM59.
HttpFS RFC
This RFC connection is used for communication with Hadoop's HttpFS service which mediates operations in HDFS.
...
It is important to enable the AUTH_CONN program registration in the SAP gateway (SAP gateway access).
Java RFC
Java RFC by name refers to the Java service which is used for communication with Hadoop services. Again, the setup is basic, and parameters of the Java connector are defined in separate tables:
...
It is important to enable the JAVA_CONN program registration in the SAP gateway (SAP gateway access).
Java connector setup
Java connectors are configured using files that has to to be defined in SAP system.
They are configured in the following steps.
Logical file path definition
The first step is to map logical path ZHADOOP_SECURITY to the OS path where the files are stored. The actual OS path is created in the section Datavard connector directories (/sapmnt/<SID>/global/security/dvd_conn).
Kerberos logical file definition
Before actually setting up the Hadoop storage, there are three files which are required for successful Kerberos authentication. They need to be defined as logical names to the SAP system via the FILE transaction.
When the logical path is defined, file definition follows:
ZHADOOP_KRB_KEYTAB and ZHADOOP_KRB_CONFIG refer to Kerberos keytab of <sid>hdp user and Kerberos configuration file defined in section Kerberos keytab and configuration files respectively. ZHADOOP_CDH_DRIVER refers to the custom Cloudera driver configuration file, which will be generated during the storage activation.
...