(SM-1911) Troubleshooting and Error Log

Purpose of this page is to provide a reference for errors and their possible resolution, that user can encounter while using Hadoop storage.

SAP errors



ICM_HTTP_INTERNAL_ERROR

Generic ICM error. Check DEV_ICM logs in ST11 for details. 

  • In this example RFC is set SSL inactive, but HttpFS service is SSL enabled. Set the RFC as SSL active.
  • In this example, RFC is set as SSL active, but there is no correct server or CA certificate stored in STRUST, therefore trust can't be established. Add correct certificate to STRUST.

ICM_HTTP_CONNECTION_FAILED

Generic ICM error. Check DEV_ICM logs in ST11 for details. 

  • In this example, service is not listening on the port specified. Make sure that HttpFS service is available on that host and port and there is are no blocked ports.

SSSLERR_NO_SSL_RESPONSE 

HttpFS RFC in SM69 has checkbox SSL set as Active, while the HttpFS service itself is not SSL secured. Set checkbox to inactive. 

No HDFS host available for connection

Can be found in the job log. Make sure that HTTP/HTTPS service is correctly set in ICM and active. Go to t-code SMICM → Go to → Services and make sure both HTTP and HTTPS have service name/port maintained.

SAP OS command 'xx' is different than expected

Datavard generates logical commands in SM69, based on parameters provided in /DVD/JCO_MNG. If path to Java executable is changed in this transaction, it no longer matches the generated logical command.

Make sure that path to Java exe is in sync between /DVD/JCO_MNG and SM69.

Enter logon data pop-up in Storage management

Make sure that parameter ict/disable_cookie_urlencoding is set to '1' (or '2' on latest releases). This happens because without this parameter SAP sends incorrect authentication cookie and receives response with 401 Unauthorized.

Java errors 

Datavard Java connector is used for authentication with Kerberos, and for executing SQL queries.

Logs from Datavard java connectors can be displayed either in transaction /DVD/JCO_MNG or by running report /DVD/SM_HIVE_DISPLAY_JAVA_LOG. 


Authentication errors

Java authentication errors can be very non-descriptive and differ between two Java vendors. This section provides couple of examples and their resolutions.

https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/errors.html is also very good source for troubleshooting.

Login failure for 'xx' from keytab 'xx' - connection timed out

Most likely KDC is unreachable. Try telnet to port 88 on the KDC. 

java.io.IOException: Login failure for dc1@SOFT.FAU from keytab /sapmnt/DC1/global/security/dvd_conn/dc1.keytab: javax.security.auth.login.FailedLoginException: Login error: java.net.SocketTimeoutException: connect timed out
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:962) ~[auth_conn.jar:?]
at com.datavard.http.authentication.KerberosAuthenticator.login(KerberosAuthenticator.java:75) ~[auth_conn.jar:?]
at com.datavard.http.authentication.KerberosAuthenticator.executeRequest(KerberosAuthenticator.java:38) ~[auth_conn.jar:?]
at com.datavard.http.authentication.DVDHttpDAO.send(DVDHttpDAO.java:24) [auth_conn.jar:?]
at com.datavard.http.handler.DVDExecuteHttpRequestHandler.handleRequest(DVDExecuteHttpRequestHandler.java:37) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker$FunctionDispatcher.handleRequest(DefaultServerWorker.java:1036) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker$FunctionDispatcher.handleRequest(DefaultServerWorker.java:972) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker.dispatchRequest(DefaultServerWorker.java:148) [auth_conn.jar:?]
at com.sap.conn.jco.rt.MiddlewareJavaRfc$JavaRfcServer.dispatchRequest(MiddlewareJavaRfc.java:3415) [auth_conn.jar:?]
at com.sap.conn.jco.rt.MiddlewareJavaRfc$JavaRfcServer.listen(MiddlewareJavaRfc.java:2468) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker.dispatch(DefaultServerWorker.java:254) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker.loop(DefaultServerWorker.java:346) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker.run(DefaultServerWorker.java:232) [auth_conn.jar:?]
at java.lang.Thread.run(Thread.java:811) [?:2.9 (12-15-2017)]
Caused by: javax.security.auth.login.FailedLoginException: Login error: java.net.SocketTimeoutException: connect timed out


Peer indicated failure: Error validating the login

Usually appears when username + password is used for authentication (AuthMech=3). Make sure that password is correct and hashed with /DVD/XOR_GEN. Also make sure that correct authentication type is selected.

Unable to connect to server: Failure to initialize security context.

Either path to krb5.conf is incorrect, or krb5.conf is not configured correctly.

Error creating login context using ticket cache: Unable to obtain Principal name for authentication.

Very general error. Usually refers to inconsistency of information stored in .jaas config file and reality. 

  • Check if path to keytab in .jaas is correct
  • Check if principal name in .jaas is correct
  • Check if the principal in keytab is in validity period

Unable to connect to server: GSS initiate failed

Very generic error. Can be an issue with kerbreros config file. Make sure that principal's domain is correctly set in this configuration file.

Error creating login context using ticket cache: Login Failure: all modules ignored.

Usually refers to incorrect .jaas file.

Example: You are running IBM java, but .jaas is generated for Oracle Java. 

Go to /DVD/JCO_MNG, set correct Java vendor, delete .jaas file from the OS level and click check storage again. Correct file should be generated.

Please note that .jaas file is being only generated when Java is not running when Check Storage is performed.

Error initialized or created transport for authentication: java.io.IOException 'xx' (No such file or directory)

.Jaas file is not generated. Stop java, make sure that Cloudera drivers config path parameter is maintained in the storage and click check storage. Jaas file should be generated.

Unable to connect to server: Kerberos Authentication failed

Issue is not with kerberos, but with SSL. Make sure that you have set SSL correctly in storage definition in /dvd/sm_setup. 



Other errors

JDBC driver not initialized correctly. ClassNotFoundException

Path to JDBC driver stored in SAP is either incorrect, or classname of the driver provided is wrong. Check the logical paths and physical location of drivers, as well as driver documentation for correct classname.

Return code 2 while executing insert from select

Usually refers to a failed mapreduce job due to not enough resources.

Make sure that dynamic partitioning is set to nonstrict:

  • hive.exec.dynamic.partition.mode=nonstrict

Try adding following parameters to storage management field Hints for Hive/Impala (separated by ; ) to increase resources

  • mapred.child.java.opts=-Xmx4916m
  • hive.optimize.sort.dynamic.partition=true
  • hive.exec.max.dynamic.partitions=10000
  • hive.exec.max.dynamic.partitions.pernode=10000;


Name or password is incorrect

Java doesn't start, but logs name or password is incorrect error.

Refers to invalid username - password combination maintained in /DVD/JCO_MNG for the Java version. Make sure that correct password is provided and that the user is not locked.


String index out of range

Usually refers to a typo in password, or password that is not properly encrypted. All passwords (truststore password, hive password, impala password,...) need to be hashed using report /DVD/XOR_GEN

Make sure that all hashed passwords are correct in /DVD/SM_SETUP and table /DVD/HDP_CUS_C



User 'xx' does not have privileges to access 'xx'

Sentry/Ranger permissions are not correctly set. Refer to the installation guide on how to set them and correct the setup.


Certificate chaining error

System can't establish secure TLS encrypted session. Make sure that correct certificates are stored in Jssecacerts file (java truststore).


Error during closing of session - Session <ID> does not exist

Happens when 2 java processes with same program ID are running on the same application server. Kill one of the processes. This issue is fixed in latest versions of Java connector.


Connection timeout expired

Usually caused by unavailability of target service, or network instability.

java.sql.SQLException: [Cloudera][HiveJDBCDriver](700100) Connection timeout expired. Details: None.


Couldn't acquire the DB log notification lock

The error visible in the java log is following

Parsing command used in SAP report (SE38) RSBTCOS0, created due to SAP CLI's inability to print a very long one line entry:

grep acquire app.log | tail -1 | awk '{FS=":"}{for (i=1;i<=NF;i++){printf "%s\n",$i}; print $NF}'

As can be seen, the error not only shows a recommendation to increase the maximum number of retries for acquisition of hive notification lock, but carries a second error message:

Error executing SQL query "select "NEXT_EVENT_ID" from "NOTIFICATION_SEQUENCE" for update".

This is an attempt to acquire DB lock for NOTIFICATION_LOG table in PostgreSQL (Metastore DB) and inability to do so in our case was caused by DB overload. Overload was caused by Ranger policy (Hortonworks distribution) applied on /user/<sid>hdp directory.
Every time a temporary external table was created during loads from Datavard Outboard (rapid creation with parallel loads), Ranger policy was triggered and had to update metastore table TBL_COL_PRIVS for new HDFS object.

Recommendation is to disable/not create a Ranger policy on HDFS landing zone and allow 'hive' user to read, write & execute (rwx) on directory /user/<sid>hdp instead by setting group ownership to group 'hadoop' (default primary group of hive user in Hortonworks).


Error in acquiring locks

The error visible in java log and HiveServer2 log:

FAILED: Error in acquiring locks: Lock acquisition for 
LockRequest(component:[LockComponent(type:EXCLUSIVE, level:TABLE, dbname:sapdvd_qb1, tablename:dv1_cnqb1000006, operationType:INSERT, 
isAcid:false, isDynamicPartitionWrite:true), 
LockComponent(type:SHARED_READ, level:TABLE, dbname:sapdvd_qb1, tablename:dvd_201905311537435165140, operationType:SELECT)], 
txnid:0, user:hive/HOSTNAME25.COM@DLAST.HDP.ETC.DLA.MIL, hostname:HOSTNAME25.COM, agentInfo:hive_20190531195433_8c2400cc-0f6e-4aea-8716-df5401b5c15d) 
timed out after 5506174ms. LockResponse(lockid:86547, state:WAITING)

This error happens at the step when csv files are loaded to HDFS, temporary external table is created and 'INSERT from SELECT' query is executed. INSERT operation requires an exclusive lock, which could not be acquired.
It is unclear why the Hive table (dv1_cnqb1000006) was locked, but the lock lasted for more than 4 hours (3 repeat attempts, each timed out after roughly 1,5 hour = 5.500.000ms).

Recommendation is to identify the source of the table lock via 'SHOW LOCKS <TABLE_NAME>' query.

The lock in this scenario was caused by misconfiguration of Hive. ACID tables option was turned on, but no compaction was enabled.