(GLUE-1806) Oozie Connector - Installation Guide
The purpose of this document is to explain to the end-user the prerequisites of scheduling standard Hadoop services via Datavard Oozie Connector utilized by Datavard Glue.
HttpFS / WebHDFS
This section provides information about supported APIs for accessing data stored within Hadoop (API = Application Interface). It contains prerequisites that are required to enable communication between the SAP system and a Hadoop cluster.
Hadoop offers different types of access APIs. Two of them offer access via the HTTP protocol using either the HttpFS or the WebHDFS API. Communication with both APIs is handled by Datavard ’s Hadoop HTTP Connector. This enables our services to access Hadoop without any need for additional non-ABAP components.
HttpFS and WebHDFS are very similar HTTP-based services. The main difference lies in handling of redirection requested by a Hadoop NameNode. HttpFS handles the redirection itself, while WebHDFS requires assistance of the client.
Prerequisites – Hadoop Cluster
Apache Hadoop version 2.0.0 or higher is installed within a corporate infrastructure, either on premise or within the cloud (e.g. Microsoft Azure, Amazon’s AWS or Cloudera CDH5).
The recommended Hadoop version is 2.4.0 or higher where major supportability improvements and bug fixes were applied to WebHDFS and HttpFS.
Enable/Install the HttpFS or WebHDFS service on the Hadoop cluster.
HttpFS is Datavard ’s recommended option. It is more efficient because it does not require redirection to be handled by the client. It is also easier to install and configure within a corporate infrastructure because it acts as gateway (single point of access); therefore, it can be placed behind a firewall.
Apply the desired security mechanism according to the chapter Security of this document.
For the installation of the Hadoop cluster and the HttpFS or the WebHDFS services please refer to documentation supplied by your Hadoop provider or distribution.
Prerequisites – SAP
SAP NetWeaver 7.01 SPS15 or higher (ABAP stack)
Datavard ’s Reuse Library 2.05 on the NetWeaver ABAP stack
Active HTTP/HTTPS service on the SAP system (which acts as a client to the Hadoop services)
- Go to the ICM Monitor (transaction SMICM) and use the path within the menu: Goto → Services
- Check for Active HTTP/HTTPS service on the SAP system (which acts as a client to the Hadoop services)
Hadoop HTTP Connector
Datavard ’s Hadoop HTTP Connector represents a translation layer that converts communication between a SAP system and the HTTP access services (HttpFS, WebHDFS) of Hadoop.
RFC Destination
Connection parameters to an external HTTP server are stored within a RFC destination.
- Go to transaction SM59.
- Create a new RFC Connection of type ‘G’.
- Set the following parameters and save the RFC destination:
- Target Host: IPv4 address or complete host name of a server where the HTTP service is running.
- Service No.: Port of the Hadoop HTTP service
- Path Prefix: The path must have following syntax:
/webhdfs/v1/<hdp_usr_home_path>
- /webhdfs/v1 – mandatory path prefix defined by the HttpFS and WebHDFS APIs.
- <hdp_usr_home_path> - home directory path of the target Hadoop user. Each Hadoop user has their own home directory, which is for example in UNIX-type systems typically ‘/user/<user_name>’. The user is the owner of the directory and has higher access privileges.
If the WebHDFS service is installed, the host and port must correspond to the Hadoop NameNode server.
Configuration
Configure the Hadoop HTTP Connector using the following steps:
- Go to the Table View Maintenance for the configuration table /DVD/HDP_CUS_C using the transaction SM30.
- Create an entry with the following values:
- RFC Destination: RFC Destination that refers to the Hadoop HTTP service (see the chapter RFC Destination)
- User Name: Name of a Hadoop user that is either the owner or has required privileges (read, write, delete) for the directory <hdp_usr_home_path> set in the RFC destination (see the chapter RFC Destination).
- Authentication Method: Choose the appropriate authentication method. For more information refer to the chapter Security.
- Save the entry.
Security
User Authentication
WebHDFS and HttpFS have built-in security supporting different authentication mechanisms. The Current version of the Datavard Hadoop HTTP Connector supports the following authentication methods:
- NO_AUTHENTICATION: No Hadoop-specific authentication is applied
- PSEUDO_AUTHENTICATION: The user accessing Hadoop is identified only by the user name
- KERBEROS: Kerberos authentication (for Kerberos setup, refer to chapter Kerberos)
The Datavard Hadoop HTTP Connector supports also standard HTTP Basic authentication. In this case, user credentials are maintained within the corresponding RFC Destination.
Connection
It is recommended to set up the connection using the secured HTTP (HTTPS) protocol. For information on how to set up HTTPS on Hadoop, follow instructions provided by your Hadoop provider or distribution.
To set up HTTPS in your SAP system use instructions in this link: https://help.sap.com/saphelp_nw70ehp2/helpdata/en/49/23501ebf5a1902e10000000a42189c/frameset.htm
Oozie Web Services
This section provides information about supported API for submitting, managing and retrieving information of jobs (API = Application Programming Interface). It contains prerequisites that are required to enable communication between the Glue™ and an Oozie Web Services.
Oozie Web Services API is a HTTP REST JSON API. Communication with API is handled by Datavard ’s Oozie Connector, which is utilized by the Datavard Glue™. This enables Datavard Glue™ to submit, manage and retrieve information of jobs (e.g. Impala, Hive, Pig …) without any need for additional non-ABAP components.
Prerequisites – Hadoop Cluster
Apache Hadoop version 2.6.0 or higher is installed within a corporate infrastructure, either on premise or within the cloud (e.g. Microsoft Azure, Amazon’s AWS or Cloudera CDH5+).
Enable/Install the HttpFS or WebHDFS service on the Hadoop cluster.
Enable/Install the Oozie service on the Hadoop cluster.
Apply the desired security mechanism according to the chapter Security of this document.
For the installation of the Oozie service please refer to documentation supplied by your Hadoop provider or distribution.
The recommended Oozie version is 4.0.0 or higher where major supportability improvements and bug fixes were applied. For extended job log information Oozie version 4.2.0 or higher is required.
Prerequisites – SAP
SAP NetWeaver 7.01 SPS15 or higher (ABAP stack)
Datavard ’s Reuse Library 2.05 on the NetWeaver ABAP stack
Active HTTP/HTTPS service on the SAP system (which acts as a client to the Hadoop services)
- Go to the ICM Monitor (transaction SMICM) and use the path within the menu: Goto → Services
- Check for Active HTTP/HTTPS service on the SAP system (which acts as a client to the Hadoop services)
Datavard Oozie Connector
Datavard ’s Oozie Connector represents a translation layer that converts communication between a SAP system and the Oozie services.
RFC Destination
Connection parameters to an external HTTP server are stored within a RFC destination.
- Go to transaction SM59.
- Create a new RFC Connection of type ‘G’.
- Set the following parameters and save the RFC destination:
- Target Host: The host name Oozie server runs on. Default value is the output of the command hostname –f on the Hadoop cluster.
- Service No.: The port Oozie server runs. Default value 11000.
- Path Prefix: The path must have following syntax: /oozie/v1
- /oozie/v1 – mandatory path prefix defined by the Oozie Web Services API.
Configuration
Configure the Oozie Connector using the following steps:
1.Go to the Table View Maintenance for the configuration table /DVD/OOZIE_CUS_C using the transaction SM30.
2. Create an entry with the following values:
- RFC Destination: RFC Destination that refers to the Oozie service (see the chapter RFC Destination)
- name: Name of a Hadoop user on whose behalf Oozie Web Services will submit and manage jobs. Also the user is either the owner or has required privileges (read, write, delete) for the directory <hdp_usr_home_path> set in the RFC destination (see the chapter RFC Destination).
- queueName: Specify the queue in which should job run. Default value: default.
- nameNode: The host name and port of the cluster’s NameNode.
- Format: hdfs://${HOSTNAME}:${PORT}
- jobTracker: The host name and port on which the JobTracker runs
- Format: ${HOSTNAME}:${PORT}
- default.directory: A directory where all applications will be stored. The user specified in user.name must have required privileges for the directory.
- default.directory: A sub-directory where all oozie applications will be stored.
If the Kerberos Authentication is enabled, you need to specify additional parameters:
- metastore.uri: the Hive Metastore URI.
- Format: thrift: ://${HOSTNAME}:${PORT}
- metastore.principal: Hive Metastore principal.
- Format: hive/_HOST@MYREALM
3. Save the entry.
4. Go to the Table View Maintenance for the configuration table /DVD/HDP_CUS_C using the transaction SM30.
5. Create an entry with the following values:
- RFC Destination: RFC Destination that refers to the Oozie service (see the chapter RFC Destination)
User Name: Name of a Hadoop user on whose behalf Oozie Web Services will submit and manage jobs. Also the user is either the owner or has required privileges (read, write, delete) for the directory <hdp_usr_home_path> set in the RFC destination (see the chapter RFC Destination).
- Authentication Method: Choose the appropriate authentication method. For more information refer to the chapter Security. If Kerberos authentication method is selected please refer to chapter Kerberos
6. Save the entry.
Security
User Authentication
Oozie Web Services have built-in security supporting different authentication mechanisms. The current version of the Datavard Oozie Connector supports the following authentication methods:
- NO_AUTHENTICATION: No Hadoop-specific authentication is applied
- PSEUDO_AUTHENTICATION: The user accessing Oozie is identified only by the user name
- KERBEROS: Kerberos authentication (for Kerberos setup, refer to chapter Kerberos)
The Datavard Oozie Connector supports also standard HTTP Basic authentication. In this case, user credentials are maintained within the corresponding RFC Destination.
Connection
It is recommended to set up the connection using the secured HTTP (HTTPS) protocol. For information on how to set up HTTPS on Hadoop, follow instructions provided by your Hadoop provider or distribution.
To set up HTTPS in your SAP system use instructions in this link: https://help.sap.com/saphelp_nw70ehp2/helpdata/en/49/23501ebf5a1902e10000000a42189c/frameset.htm