SAP <-> Hadoop communication
...
Apache Hive is data warehouse software in Hadoop ecosystem, that facilitates reading, writing, and managing large datasets in distributed storage using SQL-like query language.
Apache Impala is massively parallel processing (MPP) SQL query engine for data stored in Hadoop cluster. Impala brings scalable parallel database technology, enabling users to issue low-latency SQL queries to data stored in HDFS.
Both engines support SQL-like query language to execute DDL and DML operations (these are described bellow in more detail).
For purpose of data transfer to Hive/Impala engines, also HDFS is used. Firstly, the data is moved in form of .csv file to HDFS, and afterwards the engine loads the transferred data.
Communication with Hive/Impala
When communicating with Hive/Impala engines, Java connector (implemented by Datavard) is used. This connector wraps SQL like queries using JDBC jars and forwards them to the engines themselves.
...