(SM-2205) Common errors
The purpose of this page is to provide a reference for errors and their possible resolution you can encounter while using Hadoop storage.
Table of Contents:
SAP errors
ICM_HTTP_INTERNAL_ERROR
Generic ICM error. Check DEV_ICM logs in ST11 for details.
- In this example, RFC is set as SSL inactive but the HttpFS service is SSL enabled. Set the RFC as SSL active.
- In this example, RFC is set as SSL active but there is no correct server or CA certificate stored in STRUST. Therefore trust can't be established. Add the correct certificate to STRUST.
ICM_HTTP_CONNECTION_FAILED
Generic ICM error. Check DEV_ICM logs in ST11 for details.
- In this example, the service is not listening on the port specified. Make sure that HttpFS service is available on that host and port and there are no blocked ports.
SSSLERR_NO_SSL_RESPONSE
HttpFS RFC in SM69 has a checkbox SSL set as Active, while the HttpFS service itself is not SSL secured. Set the checkbox to inactive.
No HDFS host available for connection
This can be found in the job log. Make sure that the HTTP/HTTPS service is correctly set in ICM and active. Go to t-code SMICM → Go to → Services and make sure both HTTP and HTTPS have service name/port maintained.
SAP OS command 'xx' is different than expected
SNP generates logical commands in SM69, based on parameters provided in /DVD/JCO_MNG. If the path to Java executable is changed in this transaction, it no longer matches the generated logical command.
Make sure the path to Java exe is in sync between /DVD/JCO_MNG and SM69.
Enter logon data pop-up in Storage management
Make sure that parameter ict/disable_cookie_urlencoding is set to '1' (or '2' on latest releases). This happens because, without this parameter, SAP sends an incorrect authentication cookie and receives the response with 401 Unauthorized.
Java errors
Java connector is used for authentication with Kerberos and for executing SQL queries.
Logs from Java connectors can be displayed either in transaction /DVD/JCO_MNG or by running report /DVD/SM_HIVE_DISPLAY_JAVA_LOG.
Authentication errors
Java authentication errors can be very non-descriptive and differ between two Java vendors. This section provides a couple of examples and their resolutions.
https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/errors.html is also a very good source for troubleshooting.
Login failure for 'xx' from keytab 'xx' - connection timed out
Most likely KDC is unreachable. Try telnet to port 88 on the KDC.
java.io.IOException: Login failure for dc1@SOFT.FAU from keytab /sapmnt/DC1/global/security/dvd_conn/dc1.keytab: javax.security.auth.login.FailedLoginException: Login error: java.net.SocketTimeoutException: connect timed out
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:962) ~[auth_conn.jar:?]
at com.datavard.http.authentication.KerberosAuthenticator.login(KerberosAuthenticator.java:75) ~[auth_conn.jar:?]
at com.datavard.http.authentication.KerberosAuthenticator.executeRequest(KerberosAuthenticator.java:38) ~[auth_conn.jar:?]
at com.datavard.http.authentication.DVDHttpDAO.send(DVDHttpDAO.java:24) [auth_conn.jar:?]
at com.datavard.http.handler.DVDExecuteHttpRequestHandler.handleRequest(DVDExecuteHttpRequestHandler.java:37) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker$FunctionDispatcher.handleRequest(DefaultServerWorker.java:1036) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker$FunctionDispatcher.handleRequest(DefaultServerWorker.java:972) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker.dispatchRequest(DefaultServerWorker.java:148) [auth_conn.jar:?]
at com.sap.conn.jco.rt.MiddlewareJavaRfc$JavaRfcServer.dispatchRequest(MiddlewareJavaRfc.java:3415) [auth_conn.jar:?]
at com.sap.conn.jco.rt.MiddlewareJavaRfc$JavaRfcServer.listen(MiddlewareJavaRfc.java:2468) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker.dispatch(DefaultServerWorker.java:254) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker.loop(DefaultServerWorker.java:346) [auth_conn.jar:?]
at com.sap.conn.jco.rt.DefaultServerWorker.run(DefaultServerWorker.java:232) [auth_conn.jar:?]
at java.lang.Thread.run(Thread.java:811) [?:2.9 (12-15-2017)]
Caused by: javax.security.auth.login.FailedLoginException: Login error: java.net.SocketTimeoutException: connect timed out
Peer indicated failure: Error validating the login
Usually appears when the username + password is used for authentication (AuthMech=3). Make sure that password is correct and hashed with /DVD/XOR_GEN. Also, make sure that the correct authentication type is selected.
Unable to connect to server: Failure to initialize security context.
Either the path to krb5.conf is incorrect, or krb5.conf is not configured correctly.
Error creating login context using ticket cache: Unable to obtain Principal name for authentication.
Very general error. Usually refers to the inconsistency of information stored in the .jaas config file and reality.
- Check if the path to keytab in .jaas is correct
- Check if the principal name in .jaas is correct
- Check if the principal in keytab is in the validity period
Unable to connect to server: GSS initiate failed
Very generic error. Can be an issue with the Kerberos config file. Make sure that the principal's domain is correctly set in this configuration file.
Error creating login context using ticket cache: Login Failure: all modules ignored.
Usually refers to an incorrect .jaas file.
Example: You are running IBM java but .jaas is generated for Oracle Java.
Go to /DVD/JCO_MNG, set correct Java vendor, delete .jaas file from the OS level, and click check storage again. The correct file should be generated.
Please note that the .jaas file is being only generated when Java is not running when Check Storage is performed.
Error initialized or created transport for authentication: java.io.IOException 'xx' (No such file or directory)
.jaas file is not generated. Stop java, make sure that the Cloudera drivers config path parameter is maintained in the storage, and click check storage. Jaas file should be generated.
Unable to connect to server: Kerberos Authentication failed
The issue is not with Kerberos but with SSL. Make sure that you have set SSL correctly in storage definition in /DVD/SM_SETUP.
Other errors
JDBC driver not initialized correctly. ClassNotFoundException
The path to the JDBC driver stored in SAP is either incorrect or class name of the driver provided is wrong. Check the logical paths and physical location of drivers, as well as driver documentation for the correct class name.
Return code 2 while executing insert from select
Usually refers to a failed MapReduce job due to not enough resources.
Make sure that dynamic partitioning is set to nonstrict:
- hive.exec.dynamic.partition.mode=nonstrict
Try adding following parameters to storage management field. Hints for Hive/Impala (separated by ; ) to increase resources
- mapred.child.java.opts=-Xmx4916m
- hive.optimize.sort.dynamic.partition=true
- hive.exec.max.dynamic.partitions=10000
- hive.exec.max.dynamic.partitions.pernode=10000;
Following hints proved to be helpful with Outboard loads with package size >500MB
- mapreduce.map.java.opts=-Xmx6144m;mapreduce.reduce.java.opts=-Xmx6144m
Impala does not have WRITE access to HDFS location: hdfs://<HDFS_path>/READ_DVD_<timestamp>
Usually observed with SNP OutBoard™ implementation. Storage connection is successful, the offloading of data from SAP to Hive works fine, but the verification fails (with Impala being used as read engine).
The error appears due to delayed synchronization of Sentry rules and HDFS ACLs. When a temporary table designated for read output is created, a subsequent attempt to write data as an Impala user fails.
A known workaround for this is setting up hints for impala SYNC_DDL=true, which will slow down some DDL/DML statements.
More info: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_sync_ddl.html
Name or password is incorrect
Java doesn't start but the log's name or password is an incorrect error.
Refers to invalid username - password combination maintained in /DVD/JCO_MNG for the Java version. Make sure that the correct password is provided and that the user is not locked.
String index out of range
Usually refers to a typo in the password or a password that is not properly encrypted. All passwords (truststore password, hive password, impala password...) need to be hashed using the report /DVD/XOR_GEN
Make sure that all hashed passwords are correct in /DVD/SM_SETUP and table /DVD/HDP_CUS_C
User 'xx' does not have privileges to access 'xx'
Sentry/Ranger permissions are not correctly set. Refer to the installation guide on how to set them and correct the setup.
Certificate chaining error
The system can't establish a secure TLS encrypted session. Make sure that correct certificates are stored in the Jssecacerts file (java truststore).
Error during the closing of the session - Session <ID> does not exist
Happens when 2 java processes with the same program ID are running on the same application server. Kill one of the processes. This issue is fixed in the latest versions of the Java connector.
Connection timeout expired
Usually caused by the unavailability of a target service or network instability.
java.sql.SQLException: [Cloudera][HiveJDBCDriver](700100) Connection timeout expired. Details: None.
Couldn't acquire the DB log notification lock
The error visible in the java log is following
Parsing command used in SAP report (SE38) RSBTCOS0, created due to SAP CLI's inability to print a very long one-line entry:
grep acquire app.log | tail -1 | awk '{FS=":"}{for (i=1;i<=NF;i++){printf "%s\n",$i}; print $NF}'
As can be seen, the error not only shows a recommendation to increase the maximum number of retries for the acquisition of hive notification lock but carries a second error message:
Error executing SQL query "select "NEXT_EVENT_ID" from "NOTIFICATION_SEQUENCE" for update".
This is an attempt to acquire a DB lock for the NOTIFICATION_LOG table in PostgreSQL (Metastore DB) and the inability to do so. In our case, it was caused by DB overload. Overload was caused by Ranger policy (Hortonworks distribution) applied on /user/<sid>hdp directory.
Every time a temporary external table was created during loads from SNP OutBoard™ (rapid creation with parallel loads), the Ranger policy was triggered and had to update the metastore table TBL_COL_PRIVS for new HDFS object.
The recommendation is to disable/not create a Ranger policy on the HDFS landing zone and allow 'hive' users to read, write & execute (rwx) on directory /user/<sid>hdp instead by setting group ownership to group 'Hadoop' (default primary group of hive user in Hortonworks).
Error in acquiring locks
The error visible in java log and HiveServer2 log:
FAILED: Error in acquiring locks: Lock acquisition for LockRequest(component:[LockComponent(type:EXCLUSIVE, level:TABLE, dbname:sapdvd_qb1, tablename:dv1_cnqb1000006, operationType:INSERT, isAcid:false, isDynamicPartitionWrite:true), LockComponent(type:SHARED_READ, level:TABLE, dbname:sapdvd_qb1, tablename:dvd_201905311537435165140, operationType:SELECT)], txnid:0, user:hive/HOSTNAME25.COM@DLAST.HDP.ETC.DLA.MIL, hostname:HOSTNAME25.COM, agentInfo:hive_20190531195433_8c2400cc-0f6e-4aea-8716-df5401b5c15d) timed out after 5506174ms. LockResponse(lockid:86547, state:WAITING)
This error happens when CSV files are loaded to HDFS, a temporary external table is created, and the 'INSERT from SELECT' query is executed. INSERT operation requires an exclusive lock, which could not be acquired.
It is unclear why the Hive table (dv1_cnqb1000006) was locked, but the lock lasted for more than 4 hours (3 repeated attempts, each timed out after roughly 1,5 hours = 5.500.000ms).
The recommendation is to identify the source of the table lock via the 'SHOW LOCKS <TABLE_NAME>' query.
The lock in this scenario was caused by the misconfiguration of the Hive. ACID tables option was turned on, but no compaction was enabled.
JDBC connection hanging with Impala storage
Time delay at the beginning of the extraction process before data is processed and replicated.
10.12.2019 01:00:01 Read filter for variant 'ZDVD_GLUE_V1'. 10.12.2019 01:15:48 Start of extraction: '01:15:48'.
Example of 14 minutes delay before the start of the extraction.
Impala logs
1210 01:03:50.986474 32599 ImpaladCatalog.java:202] Adding: CATALOG_SERVICE_ID version: 173983 size: 49 1210 01:03:50.986577 32599 impala-server.cc:1433] Catalog topic update applied with version: 173983 new min catalog object version: 167996 1210 01:04:51.007951 32599 ImpaladCatalog.java:202] Adding: PRIVILEGE:server=server1->db=db1->grantoption=false.51 version: 173999 size: 111
The Impala catalog update took 1 minute due to many open connections at the same time.
This could happen when connection pooling is turned on in the settings and the Java connector is deployed on multiple SAP application servers. An example is a connection pooling configured to 10 connections on 10 SAP application servers. First SAP data replication on each SAP application server would trigger this pooling against Impala service on Hadoop. This would create a heavy load on DB used to store Impala metadata on Hadoop (typically MySQL or similar) and lead to performance issues/delays (catalog refresh, metadata refresh, network issues, impact to other applications, etc...).
The recommendation is to use connection pooling wisely based on SAP application servers used or only with central Java connector (deployed only on 1 SAP app server).
Connection pooling implemented in Java Connector is using simple Object Pool Pattern, where the JDBC connections are created at the beginning into the configured size of the connection pool.