(SM-2308) Common Errors

The purpose of this page is to provide a reference for errors and their possible resolution you can encounter while using Hadoop storage.


Table of Contents:

SAP errors


ICM_HTTP_INTERNAL_ERROR

Generic ICM error. Check DEV_ICM logs in ST11 for details. 

  • In this example, RFC is set as SSL inactive but the HttpFS service is SSL enabled. Set the RFC as SSL active.
  • In this example, RFC is set as SSL active but there is no correct server or CA certificate stored in STRUST. Therefore trust can not be established. Add the correct certificate to STRUST.

ICM_HTTP_CONNECTION_FAILED

Generic ICM error. Check DEV_ICM logs in ST11 for details. 

  • In this example, the service is not listening on the port specified. Make sure that HttpFS service is available on that host and port and that there are no blocked ports.

SSSLERR_NO_SSL_RESPONSE 

HttpFS RFC in SM69 has a checkbox SSL set as Active, while the HttpFS service itself is not SSL secured. Set the checkbox to inactive. 

No HDFS host available for connection

This can be found in the job log. Make sure that the HTTP/HTTPS service is correctly set in ICM and active. Go to t-code SMICM > Go to > Services and make sure both HTTP and HTTPS have service name/port maintained.

SAP OS command 'xx' is different than expected

SNP generates logical commands in SM69, based on parameters provided in /DVD/JCO_MNG. If the path to the Java executable is changed in this transaction, it no longer matches the generated logical command.
Make sure the path to Java exe is in sync between /DVD/JCO_MNG and SM69.

Enter logon data pop-up in Storage management

Make sure that parameter ict/disable_cookie_urlencoding is set to 1 (or 2 on the latest releases). This happens because, without this parameter, SAP sends an incorrect authentication cookie and receives the response with 401 Unauthorized.

Java errors 

Java connector is used for authentication with Kerberos and for executing SQL queries.
Logs from Java connectors can be displayed either in transaction /DVD/JCO_MNG or by running report /DVD/SM_HIVE_DISPLAY_JAVA_LOG. 

Authentication errors

Java authentication errors can be very non-descriptive and differ between two Java vendors. This section provides a couple of examples and their resolutions.
https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/errors.html is also a very good source for troubleshooting.

Login failure for 'xx' from keytab 'xx' - connection timed out

Most likely KDC is unreachable. Try telnet to port 88 on the KDC. 

java.io.IOException: Login failure for dc1@SOFT.FAU from keytab /sapmnt/DC1/global/security/dvd_conn/dc1.keytab: javax.security.auth.login.FailedLoginException: Login error: java.net.SocketTimeoutException: connect timed out
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:962) ~[auth_conn.jar:?]
at com.datavard.http.authentication.KerberosAuthenticator.login(KerberosAuthenticator.java:75) ~[auth_conn.jar:?]
at com.datavard.http.authentication.KerberosAuthenticator.executeRequest(KerberosAuthenticator.java:38) ~[auth_conn.jar:?]
at com.datavard.http.authentication.DVDHttpDAO.send(DVDHttpDAO.java:24) [auth_conn.jar:?]
at com.datavard.http.handler.DVDExecuteHttpRequestHandler.handleRequest(DVDExecuteHttpRequestHandler.java:37) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker$FunctionDispatcher.handleRequest(DefaultServerWorker.java:1036) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker$FunctionDispatcher.handleRequest(DefaultServerWorker.java:972) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker.dispatchRequest(DefaultServerWorker.java:148) [auth_conn.jar:?]
at com.sap.conn.jco.rt.MiddlewareJavaRfc$JavaRfcServer.dispatchRequest(MiddlewareJavaRfc.java:3415) [auth_conn.jar:?]
at com.sap.conn.jco.rt.MiddlewareJavaRfc$JavaRfcServer.listen(MiddlewareJavaRfc.java:2468) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker.dispatch(DefaultServerWorker.java:254) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker.loop(DefaultServerWorker.java:346) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker.run(DefaultServerWorker.java:232) [auth_conn.jar:?]
at java.lang.Thread.run(Thread.java:811) [?:2.9 (12-15-2017)]
Caused by: javax.security.auth.login.FailedLoginException: Login error: java.net.SocketTimeoutException: connect timed out

Peer indicated failure: Error validating the login

Usually appears when the username + password is used for authentication (AuthMech=3). Make sure that password is correct and hashed with /DVD/XOR_GEN. Also, make sure that the correct authentication type is selected.

Unable to connect to server: Failure to initialize security context.

Either the path to krb5.conf is incorrect, or krb5.conf is not configured correctly.

Error creating login context using ticket cache: Unable to obtain Principal name for authentication.

Very general error. Usually refers to the inconsistency of information stored in the .jaas config file and reality. 

  • Check if the path to keytab in .jaas is correct
  • Check if the principal name in .jaas is correct
  • Check if the principal in keytab is in the validity period

Unable to connect to server: GSS initiate failed

Very generic error. Can be an issue with the Kerberos config file. Make sure that the principal's domain is correctly set in this configuration file.

Error creating login context using ticket cache: Login Failure: all modules ignored.

Usually refers to an incorrect .jaas file.
Example: You are running IBM java but .jaas is generated for Oracle Java. 

Go to /DVD/JCO_MNG, set the correct Java vendor, delete the .jaas file from the OS level, and click check storage again. The correct file should be generated.

Note that the .jaas file is only generated when Java is not running when Check Storage is performed.

Error initialized or created transport for authentication: java.io.IOException 'xx' (No such file or directory)

.jaas file is not generated. Stop java, make sure that the Cloudera drivers config path parameter is maintained in the storage, and click check storage. Jaas file should be generated.

Unable to connect to server: Kerberos Authentication failed

The issue is not with Kerberos but with SSL. Make sure that you have set SSL correctly in the storage definition in /DVD/SM_SETUP. 


Other errors

JDBC driver not initialized correctly. ClassNotFoundException

The path to the JDBC driver stored in SAP is either incorrect or the class name of the driver provided is wrong. Check the logical paths and physical location of drivers, as well as driver documentation for the correct class name.

Return code 2 while executing insert from select

Usually refers to a failed MapReduce job due to not having enough resources.

Make sure that dynamic partitioning is set to nonstrict:

  • hive.exec.dynamic.partition.mode=nonstrict

Try adding the following parameters to the storage management field. Hints for Hive/Impala (separated by ; ) to increase resources

  • mapred.child.java.opts=-Xmx4916m
  • hive.optimize.sort.dynamic.partition=true
  • hive.exec.max.dynamic.partitions=10000
  • hive.exec.max.dynamic.partitions.pernode=10000;

The following hints proved to be helpful with Outboard loads with package sizes >500MB

  • mapreduce.map.java.opts=-Xmx6144m;mapreduce.reduce.java.opts=-Xmx6144m


Impala does not have WRITE access to HDFS location: hdfs://<HDFS_path>/READ_DVD_<timestamp>

Usually observed with SNP OutBoard™ implementation. Storage connection is successful, and the offloading of data from SAP to Hive works fine, but the verification fails (with Impala being used as read engine).
The error appears due to delayed synchronization of Sentry rules and HDFS ACLs. When a temporary table designated for read output is created, a subsequent attempt to write data as an Impala user fails.

A known workaround for this is setting up hints for impala SYNC_DDL=true, which will slow down some DDL/DML statements.
More info: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_sync_ddl.html

Name or password is incorrect

Java doesn't start but the log's name or password is an incorrect error.

Refers to invalid username-password combination maintained in /DVD/JCO_MNG for the Java version. Make sure that the correct password is provided and that the user is not locked.



String index out of range

Usually refers to a typo in the password or a password that is not properly encrypted. All passwords (truststore password, hive password, impala password, etc.) need to be hashed using the report /DVD/XOR_GEN.

Make sure that all hashed passwords are correct in /DVD/SM_SETUP and table /DVD/HDP_CUS_C.


User 'xx' does not have privileges to access 'xx'

Sentry/Ranger permissions are not correctly set. Refer to the installation guide on how to set them and correct the setup.


Certificate chaining error

The system can not establish a secure TLS-encrypted session. Make sure that correct certificates are stored in the Jssecacerts file (Java truststore).


Error during the closing of the session - Session <ID> does not exist

This happens when two Java processes with the same program ID are running on the same application server. Kill one of the processes. This issue is fixed in the latest versions of the Java connector.


Connection timeout expired

Usually caused by the unavailability of a target service or network instability.

java.sql.SQLException: [Cloudera][HiveJDBCDriver](700100) Connection timeout expired. Details: None.


Couldn't acquire the DB log notification lock

The error visible in the java log is following

The parsing command used in SAP report (SE38) RSBTCOS0, created due to SAP CLI's inability to print a very long one-line entry:

grep acquire app.log | tail -1 | awk '{FS=":"}{for (i=1;i<=NF;i++){printf "%s\n",$i}; print $NF}'

As can be seen, the error not only shows a recommendation to increase the maximum number of retries for the acquisition of the Hive notification lock but carries a second error message:

Error executing SQL query "select "NEXT_EVENT_ID" from "NOTIFICATION_SEQUENCE" for update".

This is an attempt to acquire a DB lock for the NOTIFICATION_LOG table in PostgreSQL (Metastore DB) and the inability to do so. In our case, it was caused by DB overload. Overload was caused by Ranger policy (Hortonworks distribution) applied on /user/<sid>hdp directory.
Every time a temporary external table was created during loads from SNP OutBoard™ (rapid creation with parallel loads), the Ranger policy was triggered and had to update the metastore table TBL_COL_PRIVS for a new HDFS object.

The recommendation is to disable/not create a Ranger policy on the HDFS landing zone and allow 'hive' users to read, write & execute (rwx) on directory /user/<sid>hdp instead by setting group ownership to group 'Hadoop' (default primary group of hive user in Hortonworks).


Error in acquiring locks

The error visible in the Java log and HiveServer2 log:

FAILED: Error in acquiring locks: Lock acquisition for 
LockRequest(component:[LockComponent(type:EXCLUSIVE, level:TABLE, dbname:sapdvd_qb1, tablename:dv1_cnqb1000006, operationType:INSERT, 
isAcid:false, isDynamicPartitionWrite:true), 
LockComponent(type:SHARED_READ, level:TABLE, dbname:sapdvd_qb1, tablename:dvd_201905311537435165140, operationType:SELECT)], 
txnid:0, user:hive/HOSTNAME25.COM@DLAST.HDP.ETC.DLA.MIL, hostname:HOSTNAME25.COM, agentInfo:hive_20190531195433_8c2400cc-0f6e-4aea-8716-df5401b5c15d) 
timed out after 5506174ms. LockResponse(lockid:86547, state:WAITING)

This error happens when CSV files are loaded to HDFS, a temporary external table is created, and the INSERT from SELECT query is executed. INSERT operation requires an exclusive lock, which could not be acquired.
It is unclear why the Hive table (dv1_cnqb1000006) was locked, but the lock lasted for more than 4 hours (three repeated attempts, each timed out after roughly 1,5 hours = 5.500.000ms).

The recommendation is to identify the source of the table lock via the SHOW LOCKS <TABLE_NAME> query.

The lock in this scenario was caused by the misconfiguration of the Hive. The ACID tables option was turned on, but no compaction was enabled.


JDBC connection hanging with Impala storage

Time delay at the beginning of the extraction process before data is processed and replicated.

10.12.2019 01:00:01 Read filter for variant 'ZDVD_GLUE_V1'.
10.12.2019 01:15:48 Start of extraction: '01:15:48'.

Example of 14 minutes delay before the start of the extraction.

Impala logs

1210 01:03:50.986474 32599 ImpaladCatalog.java:202] Adding: CATALOG_SERVICE_ID version: 173983 size: 49
1210 01:03:50.986577 32599 impala-server.cc:1433] Catalog topic update applied with version: 173983 new min catalog object version: 167996
1210 01:04:51.007951 32599 ImpaladCatalog.java:202] Adding: PRIVILEGE:server=server1->db=db1->grantoption=false.51 version: 173999 size: 111

The Impala catalog update took one minute due to many open connections at the same time.

This could happen when connection pooling is turned on in the settings and the Java connector is deployed on multiple SAP application servers. An example is a connection pooling configured to 10 connections on 10 SAP application servers. First SAP data replication on each SAP application server would trigger this pooling against Impala service on Hadoop. This would create a heavy load on DB used to store Impala metadata on Hadoop (typically MySQL or similar) and lead to performance issues/delays (catalog refresh, metadata refresh, network issues, impact on other applications, etc.).

The recommendation is to use connection pooling wisely based on SAP application servers used or only with a central Java connector (deployed only on one SAP app server).

Connection pooling implemented in Java Connector is using simple Object Pool Pattern, where the JDBC connections are created at the beginning into the configured size of the connection pool.


Error creating login context using ticket cache: Unable to obtain Principal Name for authentication

This is generic error during attempts to connect to Hive services. Attempt to connect to HDFS storage itself returns different error: message: javax.security.auth.login.LoginException: no supported default etypes for default_tkt_enctypes.

Reason: Higher version of JRE is not compatible with enabled cipher suite(s) in kerberos.cfg or enabled cipher suite(s) are too high for obsolete version of JRE.