Datavard Glue is a middleware solution for integration of SAP systems with Hadoop. It can run on any SAP system (ERP, BW, Solution Manager, etc.) with ABAP stack higher than 7.0.1. Datavard Glue is operated centrally from SAPGUI, thus helping SAP professionals with accessing the technology of Hadoop.
The connection is provided by a direct ABAP to Hadoop connector.
You can find below a description of two main use cases from the perspective of a user:

SAP BW analytics with Hadoop data
BigData analytics with SAP data

SAP BW analytics with Hadoop data

Using Datavard Glue, it is possible to perform business intelligence data analytics on both SAP and Hadoop data. Data from Hadoop does not necessarily need to be replicated to SAP, it can be virtualized and then integrated into SAP data flow.
In the picture below is the entire process divided into 3 main steps:

Business Intelligence Application queries data from SAP BW system with specific user authorizations
Datavard Glue Virtual InfoProvider retrieves data from Hadoop (Hive) table
Datavard Glue returns a combination of SAP and Hadoop data.

In the scheme above SAP Business Objects is displayed only as an example, as various other business analytic front-ends can be used instead.

In this section

Big Data analytics with SAP data

It is possible to perform Big Data analytics with SAP data and combine it with data from various sources, such as sensor measurements or social media.
Using Datavard Glue, SAP data together with SAP authorizations can be replicated to Hadoop. Once in Hadoop it can be accessed and processed by a Big Data analytics tool, as is Tableau, Power BI, Sisense or other.

Key features and benefits

Modeling database tables in Hadoop
Access to data in Hadoop from SAPGUI
Data extraction from SAP system into Hadoop or the other way round
Possibility to use Hadoop storage options for data extraction – HIVE and Impala
Integration of Hadoop data into the SAP data flow
Possibility to use Hadoop script editors (Hive, Impala, Pig)
System integration with SAP Transport Management System

Modeling database table in Hadoop
Using Datavard Glue it is possible to create database tables in a similar way as in ABAP Dictionary (SE11). It is possible for the user to specify data types and based on this classification, the system creates a corresponding table on the Hadoop side.

Accessing data in Hadoop
Datavard Glue Data Browser enables to access data in Hadoop directly from SAPGUI. This functionality is similar to ABAP Data Browser (SE16) and likewise it is used only to view content of tables.

Integration of Hadoop data with SAP data flow
Datavard Glue InfoProvider enables to retrieve data from Hadoop and use it for BW reporting purposes. The user can map Hadoop data to BW InfoObjects, and thus simplify the integration between Hadoop and complex SAP data flows.

Data extraction
Data extraction between SAP and Hadoop is supported in both directions. To extract data from SAP, the user needs to create a Hive (Hadoop) table from SAPGUI and trigger the extraction of data through a SAP job. The data transfer is accomplished using a Hive or Impala storage.
Using the same principles data can be transferred also the other way round from Hadoop into a SAP DDIC table. It is also possible to create a SAP table based on an existing Hadoop table and transfer data to it afterwards.

Hadoop script editor
Datavard Glue Script Editor was created to bridge the gap between SAP and Hadoop environments. Through Datavard Glue Script Editor the user can create in SAPGUI Hadoop specific scripts and then trigger them on Hadoop.
Following script types are supported:

Hive
Impala
Pig

Integration with SAP Transport Management System
All objects created with Datavard Glue can be transferred using SAP Transport Management System (TMS). Glue objects are created on a SAP development system and in the case of Glue tables they are simultaneously created on a Hadoop cluster or another database connected to the SAP development system. With SAP TMS, the Datavard Glue objects meta data are transferred to SAP Quality/Test/Production system. Using Glue TMS the user can trigger the creation of Glue objects on the target system based on the imported meta data. In the case of Glue tables are with later step also created Hadoop tables on the Hadoop Quality/Test/Production cluster or another database.

Hadoop components utilized by Datavard Glue

Hadoop
The Apache Hadoop software library is a framework that allows distributed processing of large data sets across computer clusters using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer. This means delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

HttpFS / WebHDFS
HttpFS and WebHDFS are very similar HTTP-based services. The main difference lies in handling of redirection requested by a Hadoop NameNode. HttpFS handles the redirection itself, while WebHDFS requires assistance of the client. The recommended Hadoop version is 2.6.0 or higher where major supportability improvements and bug fixes were applied to WebHDFS and HttpFS.

Hive
Hive is a data warehousing infrastructure based on Apache Hadoop. Hadoop provides massive scale out and fault tolerance capabilities for data storage and processing on commodity hardware.
Hive is designed to enable easy data summarization, ad-hoc querying and analysis of large volumes of data. It provides SQL which enables users to perform ad-hoc querying, summarization and data analysis easily. At the same time, Hive's SQL gives users multiple places to integrate their own functionality to perform custom analysis, such as User Defined Functions (UDFs).

Impala
The Apache Impala project provides high-performance, low-latency SQL queries on data stored in popular Apache Hadoop file formats. The fast response for queries enables interactive exploration and fine-tuning of analytic queries, rather than long batch jobs traditionally associated with SQL-on-Hadoop technologies.

(Glue-1911) User Guide

SAP BW analytics with Hadoop data

In this section

Big Data analytics with SAP data

Key features and benefits

Hadoop components utilized by Datavard Glue