Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

SAP <-> Hadoop communication

...

Storage management (SM) is part of Reuse Library, and contains implementation for binary and table storages. For both storage types, SM provides facade which is Storage Manager. For more information about SM see /wiki/spaces/ReuseLib/pages/224985187.

In this case (Hadoop connection), Table Storage is used. 

...

Apache Hive is data warehouse software in Hadoop ecosystem, that facilitates reading, writing, and managing large datasets in distributed storage using SQL-like query language.

Apache Impala is massively parallel processing (MPP) SQL query engine for data stored in Hadoop cluster. Impala brings scalable parallel database technology, enabling users to issue low-latency SQL queries to data stored in HDFS.


Both engines support SQL-like query language to execute DDL and DML operations (these are described bellow in more detail).

For purpose of data transfer to Hive/Impala engines, also HDFS is used. Firstly, the data is moved in form of .csv file to HDFS, and afterwards the engine loads the transferred data. 

Communication with Hive/Impala

When communicating with Hive/Impala engines, Java connector (implemented by Datavard) is used. This connector wraps SQL like queries using JDBC jars and forwards them to the engines themselves.

...