Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

SAP <-> Hadoop communication

...

Apache Hive is data warehouse software in Hadoop ecosystem, that facilitates reading, writing, and managing large datasets in distributed storage using SQL-like query language.

Apache Impala is massively parallel processing (MPP) SQL query engine for data stored in Hadoop cluster. Impala brings scalable parallel database technology, enabling users to issue low-latency SQL queries to data stored in HDFS.


Both engines support SQL-like query language to execute DDL and DML operations (these are described bellow in more detail).

For purpose of data transfer to Hive/Impala engines, also HDFS is used. Firstly, the data is moved in form of .csv file to HDFS, and afterwards the engine loads the transferred data. 

Communication with Hive/Impala

When communicating with Hive/Impala engines, Java connector (implemented by Datavard) is used. This connector wraps SQL like queries using JDBC jars and forwards them to the engines themselves.

...