...

Prerequisites

Open ports

In a controlled network environment, it is common to have firewall rules in place. In order to enable communication of SAP systems with Hadoop, the following port numbers need to be reachable in Hadoop cluster from the SAP system:

...

Proper DNS name translation needs to be configured between SAP and Hadoop for Kerberos communication.

Hive parameters

There are two configuration parameters of the Hive service which need to be properly configured in Hive Service Advanced Configuration Snippet (Safety Valve) for Hive-site.xml

Code Block
hive.exec.dynamic.partition = true hive.exec.dynamic.partition.mode = nonstrict

Example:

Hadoop user

Datavard recommends to create distinct users for every SAP system connected to the Hadoop cluster in order to isolate each system's data.
There is usually central repository for hadoop users (LDAP/AD), but you can also create the user locally (on every hadoop cluster node).

Each of these users has to have its own dedicated user group.
If Hadoop Sentry is used: User groups will be used for the definition of Sentry access rules.

...

Code Block
$ hdfs dfs -ls -d /user/dvqhdp -rwxrwxr-x 3 dvqhdp supergroup 0 --- /user/dvqhdp

If Kerberos is used: Create Kerberos principal for user. You need to run kadmin.local on host where Kerberos DB is running

Hive database

Datavard recommends to create a dedicated database (schema) in Hive for each SAP system. The recommended database name is sap<sid> (sapdvq).

...

SAP Java Connector library

SAP Java Connector 3.0 library libsapjco3.so needs to be accessible, which is available for download from the SAP marketplace. It needs to be present in the SAP kernel directory or in other directory pointed by LD_LIBRARY_PATH environment variable of <sid>adm user.

Code Block
$ which libsapjco3.so /usr/sap/DVQ/SYS/exe/uc/linuxx86_64/libsapjco3.so

OS directories

Hadoop connector uses two directories dedicated to its configuration and log files:

...

The first one (/sapmnt/<SID>/global/security/dvd_conn) is used to store Kerberos and SSL related files and is shared among SAP application servers.

The second will store drivers, configuration and log files of Java connector used by Datavard Glue. Ownership and permissions need to be set appropriately to <sid>adm

Create directories on each SAP application server to store Storage Management related configuration and log files. <sid>adm user needs permissions to access those folders.

JDBC Drivers

JDBC protocol is used to connect to Hadoop services (Hive and Impala). JDBC drivers have to be manually stored on the operating system and accessible to Datavard connector.

...

https://help.sap.com/saphelp_nw73/helpdata/en/e2/16d0427a2440fc8bfc25e786b8e11c/content.htm

CONFIGURATION

When all prerequisites are fulfilled, further configuration is done from within the SAP system.

RFC Destinations

There are three RFC connections which need to be created via transaction SM59.

HttpFS RFC

This RFC connection is used for communication with Hadoop's HttpFS service which mediates operations in HDFS.

...

It is important to enable the AUTH_CONN program registration in the SAP gateway (SAP gateway access).

Java RFC

Java RFC by name refers to the Java service which is used for communication with Hadoop services. Again, the setup is basic, and parameters of the Java connector are defined in separate tables:

...

It is important to enable the JAVA_CONN program registration in the SAP gateway (SAP gateway access).

Java connector setup

Java connectors are configured using files that has to to be defined in SAP system.
They are configured in the following steps.

Logical file path definition

The first step is to map logical path ZHADOOP_SECURITY to the OS path where the files are stored. The actual OS path is created in the section Datavard connector directories (/sapmnt/<SID>/global/security/dvd_conn).

Image Modified

Kerberos logical file definition

Before actually setting up the Hadoop storage, there are three files which are required for successful Kerberos authentication. They need to be defined as logical names to the SAP system via the FILE transaction.

When the logical path is defined, file definition follows:

Image Modified

ZHADOOP_KRB_KEYTAB and ZHADOOP_KRB_CONFIG refer to Kerberos keytab of <sid>hdp user and Kerberos configuration file defined in section Kerberos keytab and configuration files respectively. ZHADOOP_CDH_DRIVER refers to the custom Cloudera driver configuration file, which will be generated during the storage activation.

...

Version	Old Version 1	New Version Current
Changes made by	Igor Strba	Igor Strba
Saved on	Jun 26, 2018	Jun 26, 2018

Versions Compared

Key

Prerequisites

Open ports

Hive parameters

Hadoop user

Hive database

SAP Java Connector library

OS directories

JDBC Drivers

CONFIGURATION

RFC Destinations

HttpFS RFC

Java RFC

Java connector setup

Logical file path definition

Kerberos logical file definition

Content Comparison

Versions Compared

Key

<span class="diff-html-changed" data-a11y-before="Start of changed content" data-a11y-after="End of changed content" id="changed-diff-0">[data-colorid=</span>

<span class="diff-html-added" data-a11y-before="Start of added content" data-a11y-after="End of added content" id="added-diff-0">fr4ny6h8cm</span><span class="diff-html-changed" data-a11y-before="Start of changed content" data-a11y-after="End of changed content" id="changed-diff-1">]{color:</span>

<span class="diff-html-added" data-a11y-before="Start of added content" data-a11y-after="End of added content" id="added-diff-2">fr4ny6h8cm</span><span class="diff-html-changed" data-a11y-before="Start of changed content" data-a11y-after="End of changed content" id="changed-diff-3">]{color:</span>

<span class="diff-html-added" data-a11y-before="Start of added content" data-a11y-after="End of added content" id="added-diff-6">qdsikzo4wi</span><span class="diff-html-changed" data-a11y-before="Start of changed content" data-a11y-after="End of changed content" id="changed-diff-7">]{color:</span>

<span class="diff-html-added" data-a11y-before="Start of added content" data-a11y-after="End of added content" id="added-diff-8">qdsikzo4wi</span><span class="diff-html-changed" data-a11y-before="Start of changed content" data-a11y-after="End of changed content" id="changed-diff-9">]{color:</span>

Prerequisites

Open ports

Hive parameters

Hadoop user

Hive database

SAP Java Connector library

OS directories

JDBC Drivers

CONFIGURATION

RFC Destinations

HttpFS RFC

Java RFC

Java connector setup

Logical file path definition

Kerberos logical file definition