(SM-2302) Hadoop Storage Setup Old

Prerequisites

Open ports

The following ports need to be open from the SAP system (with OTB) towards the Hadoop cluster:

PortHadoop service
10000Hive
14000HttpFS
21050Impala

Hive parameters

There are two configuration parameters of the Hive service which need to be properly configured in Hive Service Advanced Configuration Snippet (Safety Valve) for Hive-site.xml

hive.exec.dynamic.partition = true
hive.exec.dynamic.partition.mode = nonstrict

Example:

Hadoop user

SNP recommends creating distinct users for every SAP system connected to the Hadoop cluster in order to isolate each system's data. 
Usually, there is a central repository for Hadoop users (LDAP/AD) but you can also create the user locally. 
Each of these users needs to have its own dedicated user group.
If Hadoop Sentry is used: User groups will be used for the definition of Sentry access rules.
The recommended user names are mirroring SAP's guidelines for user names: <sid>adm  <sid>hdp.
Create the user's Kerberos principal in the form of <sid>hdp@<KERBEROS_REALM>.
The user's home directory on HDFS has to be manually created with appropriate permissions:

$ hdfs dfs -ls -d /user/dvqhdp
-rwxrwxr-x   3 dvqhdp supergroup           0 2017-02-24 11:05 /user/dvqhdp

Create a distinct user for every SAP system connecting to the Hadoop cluster.  Usually, there is a central repository for Hadoop users (LDAP/AD) but you can also create the user locally.
Important thing is that the user has to be defined identically on every Hadoop cluster node.
If Kerberos is used: Create a Kerberos principal for the user. You need to run kadmin.local on the host where Kerberos DB is running:

Hive database

SNP recommends creating a dedicated database (schema) in Hive for each SAP system. The recommended database name is sap<sid> (sapdvq).

If Hadoop Sentry is used: Two Sentry rules have to be created, which will enable all actions of <sid>hdp user on sap<sid> database and his home directory in HDFS.
Example: If HDFS ACL synchronization with Sentry permissions is enabled, the user's directory has to be added to the Sentry Synchronization Path Prefixes parameter in the HDFS service configuration.
More on HDFS ACL synchronization topic can be found at https://www.cloudera.com/documentation/enterprise/latest/topics/sg_hdfs_sentry_sync.html

SAP system copy

To ensure correct functionality of SNP Outboard™ and/or SNP Glue™ products with Hadoop after the SAP system copy both Hive metastore and HDFS data (hive and user folder) needs to be copied to the new environment. Storage then needs to be configured to point to the copied data.

OS prerequisites (underneath the SAP system)

Java version

Java 1.7 or higher needs to be installed and available to <sid>adm user.
Another option is to copy the JRE (with version 1.7 or higher) to the folder which is available to the <sid>adm user.
We found out that when connecting to a Kerberized cluster, the basic version does not work well (be it either 1.7.0 or 1.8.0) - there is a problem with the authentication method.
Therefore to connect to the Kerberized cluster, it is necessary to use a patched java version (e.g. 1.8.0_102).

SAP Java Connector Library

SAP Java Connector 3.0 library needs to be accessible. It can be downloaded from the SAP marketplace. The SAR file contains two files:
libsapjco3.so and sapjco3.jarlibsapjco3.so needs to be present in LD_LIBRARY_PATH environment variable of <sid>adm user.

OS directories

Create directories on each SAP application server to store Storage Management related configuration and log files. <sid>adm user needs permission to access those folders.

/usr/sap/<sid>/dvd_conn

Kerberos config file

Create Kerberos config file to be used by SAP backend user (can be copied from Hadoop cluster host /etc/krb5.conf):

Example:

[libdefaults]
default_realm = HADOOP.LOCAL
dns_lookup_kdc = false
dns_lookup_realm = false
ticket_lifetime = 86400
renew_lifetime = 604800
forwardable = true
default_tgs_enctypes = rc4-hmac
default_tkt_enctypes = rc4-hmac
permitted_enctypes = rc4-hmac
udp_preference_limit = 1
kdc_timeout = 3000
[realms]
HADOOP.LOCAL = {
kdc = skbtscck21.hadoop.local
admin_server = skbtscck21.hadoop.local
}


SAP prerequisites

Kerberos cookie encoding

If the Hadoop cluster is kerberized, cookie encoding needs to be disabled in every application server's profile (instance profile):

ict/disable_cookie_urlencoding = 1

RFC user

A dedicated user has to be created. User type: Communications Data is sufficient. This user has to have a role with all authorizations for RFC.


Configuration

RFCs

HttpFS RFC

Create HttpFS RFC destination:
Transaction SM59 > [Create]

[Save]
RFC Destination: Name is up to the customer.
Connection type: G (HTTP Connection to External Service).
Target host: Hadoop node running HttpFS service Service No. - 14000 as default or any other port HttpFS service is listening at.
Path Prefix: /webhdfs/v1 the part is fixed, following part of the path (in this case /user/<sid>hdp) is a path within HDFS where data uploaded from the configured SAP system will be stored.

Hive RFC

Create TCP/IP RFC destination:

[Save]
RFC Destination: This RFC will always communicate with the Java connector and as such its name can remain the default HADOOP_HIVE_CONN.
Connection type: T (TCP/IP Connection).
(o) Registered Server Program.
Program ID: HIVE_CONN.

Authentication RFC

IF KERBEROS IS USED: Create TCP/IP RFC destination:

[Save]
RFC Destination: This RFC will supply Kerberos credentials and as such its name can remain the default HADOOP_AUTH_CONN.
Connection type: T (TCP/IP Connection).
(o) Registered Server Program.
Program ID: AUTH_CONN.


Storage Management setup

Kerberos logical file definition

Two files have to to be defined in the SAP system - Kerberos configuration file, SAP backend user's keytab.

IF KERBEROS IS USED: Three files - Kerberos configuration file, SAP backend user's keytab, and Cloudera driver config need to be defined in the SAP system.
First, define the logical path, and where they are located.

Transaction: FILE > Logical File Path Definition > [New Entries]

Logical file path: ZHADOOP_SECURITY
[Save] (create/assign own workbench request if necessary)
Select the newly created logical path

and double click Assignment of Physical Paths to Logical Path
[New Entries]

Syntax group: UNIX
[Save]

NOTE: <sid>adm user needs to have permission to write and read into this folder.

Now that the path is defined, select Logical File Name Definition, Cross-Client > [New Entries] and define 2 (3 in case of using Cloudera drivers) logical files:

One for the keytab file, one for the Kerberos config file, and optional for the Cloudera driver config file.

Java connector configuration

Several configuration tables are imported to the system with SNP transports.
/DVD/HDP_CUS_C table is supplying credentials for authentication towards the Hadoop HttpFS service.
If the authentication is not enforced (no Kerberos), run:
SM30 > Table/View: /DVD/HDP_CUS_C > [Maintain] > [New Entries]

IF KERBEROS IS USED:

/DVD/JAVA_CONFIG table stores parameters of Java connector.

This table can be edited via this report: /DVD/SM_HIVE_EDIT_CONFIG


Fill in this table for version 34.

IF KERBEROS IS USED: Fill in this report for version 32 too.


Parameter nameParameter value
PASSWORDPassword for the RFC user created in the Prerequisites chapter.
USERNAMEUsername for the RFC user.
PROG_IDProgram ID of the RFC destination created in the Configuration chapter.
REP_DESTABAP_AS_WITH_POOL
MAX_RAM_USED

For authorization service, we recommend 50M (50MB), and for Hive connection at least 1G (1GB)

More if possible.

WORK_THREAD_MIN5
WORK_THREAD_MAX10
CONN_COUNT10
PEAK_LIMIT10
CLIENTClient of the RFC user
CONFIG_AS_PATH

Path to the config file (the path we specified in the Prerequisites chapter plus filename).

"<path>[Hdp/Hive]_java_to_sap.cfg"

LOG_FILE

Path to a log file (the path we specified in the Prerequisites chapter plus filename).

"<path>/[hdp/Hive]_auth.log"

CONFIG_PATH

Path to the config file (the path we specified in the Prerequisites chapter plus filename).

"<path>/[hdp/hive]_auth_conn.cfg"

JAR_PATH

Path to .jar file of the service (the path we specified in the Prerequisites chapter plus the filename).

"<path>/DVD[Http/Hive]Conn.jar"

JAVA_EXEPath to Java binary executable.

Create storage

Create Hive storage in Storage Management (Hive/Impala transparent storage v2).
/DVD/OUTBOARD > [Settings] > [Storage Management] > [Edit mode] > [New Storage]

IF KERBEROS IS USED:

[Confirm]
The expected system response is that the newly created storage was checked successfully.

Impala Storage

Similarly to Hive storage, create Impala storage
/DVD/OUTBOARD > [Settings] > [Storage Management] > [Edit mode] > [New Storage]

The expected system response is that the newly created storage was checked successfully.