Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Warning
titleSystem command

The Java service is started with a system command. You can adjust the name of this command in the table /DVD/JAVA_CONFIG with the parameter JAVA_START_CMD. The default name of the command is ZDVD_START_JAVA. In the case the system command doesn't exist, it is created automatically. You can view the system commands through the transaction SM69.

On Linux, another system command, which sets executable rights for the configuration files (chmod 755 <filename>), is required. Its name can be adjusted with the parameter CHMOD_CMD with the default value ZDVD_CHMOD.

...

Code Block
hive.exec.dynamic.partition = true
hive.exec.dynamic.partition.mode = nonstrict

Example:

Image RemovedImage Added

Hadoop user

We recommend creating distinct users for every SAP system connected to the Hadoop cluster in order to isolate each system's data. 
There is usually a central repository for Hadoop users (LDAP/AD), but you can also create the user locally (on every Hadoop cluster node).

...

Server certificates of hosts, that Glue will communicate with, need to be placed in this truststore. An alternative option is to copy the complete jssecacerts truststore from any Hadoop node and place it in this path.

SAP prerequisites

Kerberos cookie encoding

By default, the SAP system encodes certain characters in cookies. The SAP note 1160362 describes the behavior in detail. As the Kerberos cookie must not be anyhow modified for the Kerberos server to accept it, this encoding should be disabled by setting the following parameter in each SAP application server's instance profile:

Code Block
titleValue for the SAP Kernel versions lower than 7.53 patch level 5
ict/disable_cookie_urlencoding = 1

...

Warning
titleIncompatible Kernel version

SAP kernel 7.53 patch level 5 introduced a change in this parameter, which causes Storage Management to malfunction. Therefore Hadoop storage doesn't work on SAP kernel 7.53 patch level 5-222. In SAP kernel 7.53 patch level 223 and in the future versions, it is possible to change this value to "2", which once again introduces desired behavior as before. The issue is described in SAP Note 2681175.


Code Block
titleValue for the SAP Kernel versions higher than 7.53 patch level 222
ict/disable_cookie_urlencoding = 2

The parameter is dynamic in kernel version 7.53 and higher.

SAP RFC role and user

The Java connector uses a dedicated user in the SAP system for communication. In our reference configuration, we use the username 'hadoop'. This user needs to be created with type 'Communications Data' and with authorizations limiting his privileges to basic RFC communication.

...

Example of custom SAP role in PFCG transaction (Display Authorization Data):

Image RemovedImage Added

SSL for SAP RFCs

...

The HTTP service has to be active in the SAP system. It can be checked via transaction
SMICM > [Goto] > Services

Image RemovedImage Added
Image Removed

Image Added

There are two particularly important parameters affecting HTTP communication of SAP system:

...

The name and description of the destination are optional, but it is recommended to designate its purpose with keywords 'Hadoop' and 'HttpFS'. In our example, the RFC destination also contains the Hadoop server hosting the HttpFS service for the sake of clarity:

Image RemovedImage Added

Entries explained:

  • Connection Type – G for HTTP connection to an external service
  • Target host – FQDN of Hadoop server hosting HttpFS service
  • Service No. – a port number on which HttpFS service is listening (default is 14000)
  • Path Prefix – this string consists of two parts
    1. /webhdfs/v1 part is mandatory
    2. /user/dvqhdp part defines Hadoop user's 'root' directory in HDFS where flat files from SAP system will be loaded

If SSL is used: It’s necessary to enable SSL and add a client certificate list to be used in the Logon & Security tab.

Image RemovedImage Added

Authentication RFC

This RFC connection is part of the authentication mechanism towards any Hadoop cluster in a kerberized environment. 

The RFC setup is very basic as parameters affecting authentication are defined elsewhere. It is recommended to use the generic RFC name 'HADOOP_AUTH_CONN':

Image RemovedImage Added

Entries explained:

  • Connection Type – T for TCP/IP Connection
  • Activation Type – select Registered Server Program
  • Program ID – AUTH_CONN

...

Java RFC by name refers to the Java service which is used for communication with Hadoop services. Again, the setup is basic, and the parameters of the Java connector are defined in separate tables:

Image RemovedImage Added

Entries explained:

  • Connection Type – T for TCP/IP Connection
  • Activation Type – select Registered Server Program
  • Program ID – JAVA_CONN

...

The first step is to map the logical path ZHADOOP_SECURITY to the OS path where the files are stored. The actual OS path is created in the section  Datavard connector directories (/sapmnt/<SID>/global/security/dvd_conn).

Image RemovedImage Added
Image Removed

Image Added

Kerberos logical file definition

...

When the logical path is defined, the file definition follows:

Image RemovedImage Added
Image Removed

Image Added

Image RemovedImage Added

ZHADOOP_KRB_KEYTAB and ZHADOOP_KRB_CONFIG refer to Kerberos keytab of <sid>hdp user and Kerberos configuration file defined in section Kerberos keytab and configuration files respectively. ZHADOOP_CDH_DRIVER refers to the custom Cloudera driver configuration file, which will be generated during the storage activation.

...

If the Hadoop services cluster resides in a safe environment that is accessible only with SSL authentication, the following logical file needs to be defined as follows:

Image RemovedImage Added

Drivers logical file definition

...

In our example, we will be using Hive and Impala JDBC Drivers provided by Cloudera. The first step is to map the logical path ZJDBC_DRIVER_PATH to the OS path where the files are stored (in our case /urs/sap/<SID>/dvd_conn/drivers/).

Example:

Image RemovedImage Added

When the logical path is defined, a definition of driver specific folders follows:

Image RemovedImage Added
Image Removed

Image Added


ZJDBC_HIVE_CLOUDERA_JARS and ZJDBC_IMPALA_CLOUDERA_JARS refer to the folders in which Hive JDBC drivers and Impala JDBC drivers provided by Cloudera have been placed in section JDBC Drivers.

...

The table can be maintained via transaction SM30.

Sample entry:

Image RemovedImage Added

Entries explained:

  • Destination – HttpFS RFC destination created in HttpFS RFC
  • User Name – Hadoop user principal created in Hadoop user, group, and HDFS directory
  • Auth. Method – authentication method towards the Hadoop cluster
  • Krb. Keytab – logical file definition for Kerberos keytab file
  • Krb. Config – logical file definition for Kerberos configuration file
  • Krb. Service RFC – authentication RFC destination created in Authentication RFC
  • SSL Keystore – logical file definition for SSL Keystore
  • SSL Password – password for accessing SSL Keystore

...

Table entries can be if needed, maintained via transaction.

Table example:

Image RemovedImage Added

/DVD/JAVA_CONFIG – stores parameters for Datavard Java connectors. The table needs to be populated with entries via transaction SE16.

Sample configuration:

Image RemovedImage Added

Prerequisites:

Multiple entries from the /DVD/JAVA_CONFIG table define the location and filename where a specific file will be generated. These will be marked as (generated) in the detailed explanation below.

...

Use report /DVD/XOR_GEN for this purpose.

Image RemovedImage Added
Image Removed

Image Added


Storage Management setup

...

/DVD/SM_SETUP > [Edit mode] > [New storage]

Image RemovedImage Added

Entries explained:

  • Storage ID – name Name of the storage
  • Storage Type – choose Choose HADOOP for HDFS
  • Description – extended Extended description of the storage for easier identification
  • RFC Destination – HttpFS RFC destination defined in HttpFS RFC

...

The Hive metastore storage is created in a very similar way to the process of setting up the HDFS storage, but the values are different:

Image RemovedImage Added

Entries explained:

  • Storage ID – name of the storage
  • Storage Type – choose SM_TRS_HV2 for Hive
  • Description – extended description of the storage for easier identification
  • Database – Hive database created in Hive database
  • Hive host – Hadoop server hosting the Hive service
  • Hive username – Hadoop user created in Hadoop user, group, and HDFS directory
  • Hive password - password for Hive user
  • Impala host – Hadoop server hosting the Impala service
  • Impala username – Hadoop user created in Hadoop user, group, and HDFS directory
  • Staging location type - Storage location for data staging area (external CSV tables)
  • Staging location url (non-default) - URL address for data staging area (e.g. Azure DataLake)
  • HTTP RFC Destination – HttpFS RFC destination defined in HttpFS RFC
  • HTTP RFC Destination (HA) - HttpFS RFC destination (High Availability)
  • Java connector RFC – Hive RFC destination defined in Hive RFC
  • Load Engine - Engine used for loading (writing) data, e.g. Hive or Impala
  • Read Engine - Engine used for reading data, e.g. Hive or Impala
  • Load driver classname - Classname of the driver used for loading (e.g. Cloudera Hive - com.cloudera.hive.jdbc41.HS2Driver)
  • Load driver path - Logical name of the Load driver file
  • Read driver classname - Classname of the driver used for reading (e.g.Cloudera Impala - com.cloudera.impala.jdbc41.Driver)
  • Read driver path - Logical name of the Read driver path
  • Use custom connection string checkbox - if checked, use the custom connection string
  • Custom connection string - standard settings are ignored, the custom connection string is used instead
  • Use Kerberos checkbox – checked in case the Hadoop cluster is Kerberized
  • Kerberos config file path – Logical name of Kerberos configuration file defined in Kerberos logical file definition
  • Hive service principal – Kerberos principal of the Hive service, must reflect the Hive host
  • Impala service principal - Kerberos principal of the Impala service, must reflect the Impala host
  • Kerberos keytab path – Logical name of Kerberos principal keytab file defined in Kerberos logical file definition
  • Kerberos user principal - User name (Kerberos principal) defined for connection to Hadoop
  • File Type – file format in which Hive will store table data on HDFS
  • Compression codec - Compression codec used for storing data on HDFS
  • HDFS Permissions - UNIX permissions for files created on HDFS
  • Use Cloudera drivers checkbox – always checked if the Hadoop cluster is a Cloudera distribution
  • Cloudera driver config path – Logical name of the Cloudera driver configuration file defined in Kerberos logical file definition
  • Skip trash - checked if HDFS files should NOT be moved to trash after deleting them
  • Impala port - Impala JDBC port
  • Hive port - Hive JDBC port
  • Hints for hive/impala - Hints that can be specified for JDBC connection (e.g. SYNC_DDL=TRUE)
  • Open cursor logic - Select which logic will be used for reading via cursor
  • Repetition for HDFS - number of times HDFS request will be repeated in case of failure
  • Use compression on transfer - checked in case compression is used for files created on HDFS
  • Compression level - level of compression (0-minimum, 9-maximum)
  • Use SSL - checked if SSL authentication should be used
  • SSL Keystore path - logical name of the SSL Keystore file
  • SSL Keystore password - password to Keystore
  • Force file cursor reader checkbox (expert setting) - cursor reader is used all the time when reading data from Hadoop
  • Use extended escaping checkbox (expert setting) - extending escaping is used all the time when writing data to Hadoop

...

In case of any issues with the Java connector, Java logs can be read from an application server to determine the source of the issue with the report /DVD/SM_HIVE_DISPLAY_JAVA_LOG.

Image RemovedImage Added

Entries explained:

  • RFC Destination - Destination of RFC with Java service
  • Num. of last lines to display - How many last lines of log should be displayed
  • Name of file with log - Filename of file with java log. (in default destination)
  • Is log archived? - Check if need to display archived log (date obligatory)
  • Date of log - Date of the archived log to display