/
(GLUE-1905) User Guide

(GLUE-1905) User Guide

Datavard Glue is a middleware solution for integration of SAP systems with Hadoop. It can run on any SAP system (ERP, BW, Solution Manager, etc.) with ABAP stack higher than 7.0.1. Datavard Glue is operated centrally from SAPGUI, thus helping SAP professionals with accessing the technology of Hadoop. 
The connection is provided by a direct ABAP to Hadoop connector.
You can find below a description of two main use cases from the perspective of a user:

  1. SAP BW analytics with Hadoop data
  2. BigData analytics with SAP data



SAP BW analytics with Hadoop data

Using Datavard Glue, it is possible to perform business intelligence data analytics on both SAP and Hadoop data. Data from Hadoop does not necessarily need to be replicated to SAP, it can be virtualized and then integrated into SAP data flow.
In the picture below is the entire process divided into 3 main steps:

  1. Business Intelligence Application queries data from SAP BW system with specific user authorizations
  2. Datavard Glue Virtual InfoProvider retrieves data from Hadoop (Hive) table
  3. Datavard Glue returns a combination of SAP and Hadoop data.


 

In the scheme above SAP Business Objects is displayed only as an example, as various other business analytic front-ends can be used instead.

In this section


Big Data analytics with SAP data

It is possible to perform Big Data analytics with SAP data and combine it with data from various sources, such as sensor measurements or social media.
Using Datavard Glue, SAP data together with SAP authorizations can be replicated to Hadoop. Once in Hadoop it can be accessed and processed by a Big Data analytics tool, as is Tableau, Power BI, Sisense or other. 


 

Key features and benefits

  • Modeling database tables in Hadoop
  • Access to data in Hadoop from SAPGUI
  • Data extraction from SAP system into Hadoop or the other way round
  • Possibility to use Hadoop storage options for data extraction – HIVE and Impala
  • Integration of Hadoop data into the SAP data flow
  • Possibility to use Hadoop script editors (Hive, Impala, Pig)
  • System integration with SAP Transport Management System


Modeling database table in Hadoop
Using Datavard Glue it is possible to create database tables in a similar way as in ABAP Dictionary (SE11). It is possible for the user to specify data types, and based on this classification, the system creates a corresponding table on the Hadoop side.


Accessing data in Hadoop
Datavard Glue Data Browser enables to access data in Hadoop directly from SAPGUI. This functionality is similar to ABAP Data Browser (SE16) and likewise it is used only to view content of tables.


Integration of Hadoop data with SAP data flow
Datavard Glue InfoProvider enables to retrieve data from Hadoop and use it for BW reporting purposes. The user can map Hadoop data to BW InfoObjects, and thus simplify the integration between Hadoop and complex SAP data flows.


Data extraction
Data extraction between SAP and Hadoop is supported in both directions. To extract data from SAP, the user needs to create a Hive (Hadoop) table from SAPGUI and trigger the extraction of data through a SAP job. The data transfer is accomplished using a Hive or Impala storage.
Using the same principles data can be transferred also the other way round from Hadoop into a SAP DDIC table. It is also possible to create a SAP table based on an existing Hadoop table and transfer data to it afterwards. 


Hadoop script editor
Datavard Glue Script Editor was created to bridge the gap between SAP and Hadoop environments. Through Datavard Glue Script Editor the user can create in SAPGUI Hadoop specific scripts and then trigger them on Hadoop.
Following script types are supported:

  • Hive
  • Impala
  • Pig


Integration with SAP Transport Management System
All objects created with Datavard Glue can be transferred using SAP Transport Management System (TMS). Glue objects are created on a SAP development system and in the case of Glue tables they are simultaneously created on a Hadoop cluster or another database connected to the SAP development system. With SAP TMS, the Datavard Glue objects meta data are transferred to SAP Quality/Test/Production system. Using Glue TMS the user can trigger the creation of Glue objects on the target system based on the imported meta data. In the case of Glue tables are with later step also created Hadoop tables on the Hadoop Quality/Test/Production cluster or another database.

Hadoop components utilized by Datavard Glue


Hadoop
The Apache Hadoop software library is a framework that allows distributed processing of large data sets across computer clusters using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. 


HttpFS / WebHDFS 
HttpFS and WebHDFS are very similar HTTP-based services. The main difference lies in handling of redirection requested by a Hadoop NameNode. HttpFS handles the redirection itself, while WebHDFS requires assistance of the client. The recommended Hadoop version is 2.6.0 or higher where major supportability improvements and bug fixes were applied to WebHDFS and HttpFS. 


Hive
Hive is a data warehousing infrastructure based on Apache Hadoop. Hadoop provides massive scale out and fault tolerance capabilities for data storage and processing on commodity hardware.
Hive is designed to enable easy data summarization, ad-hoc querying and analysis of large volumes of data. It provides SQL which enables users to perform ad-hoc querying, summarization and data analysis easily. At the same time, Hive's SQL gives users multiple places to integrate their own functionality to perform custom analysis, such as User Defined Functions (UDFs). 


Impala
The Apache Impala project provides high-performance, low-latency SQL queries on data stored in popular Apache Hadoop file formats. The fast response for queries enables interactive exploration and fine-tuning of analytic queries, rather than long batch jobs traditionally associated with SQL-on-Hadoop technologies.