(Glue-2105) User Guide

Datavard Glue is a middleware solution for integration of SAP systems with Data lakes. It can run on any SAP system (ERP, BW, Solution Manager, etc.) with ABAP stack higher than 7.0.1. Datavard Glue is operated centrally from SAPGUI, thus helping SAP professionals with accessing the Data lake technology. 
The connection is provided by a direct ABAP to Data lake connector.
You can find below a description of two main use cases from the perspective of a user:

  1. SAP BW analytics with Data lake
  2. BigData analytics with SAP data



SAP BW analytics with Data lake

Using Datavard Glue, it is possible to perform business intelligence data analytics on both SAP and Data lake. Data from Data lake does not necessarily need to be replicated to SAP, it can be virtualized and then integrated into SAP data flow.
In the picture below is the entire process divided into 3 main steps:

  1. Business Intelligence Application queries data from SAP BW system with specific user authorizations
  2. Datavard Glue Virtual InfoProvider retrieves data from Data lake (Hive) table 
  3. Datavard Glue returns a combination of SAP and Data lake data.


 

In the scheme above SAP Business Objects is displayed only as an example, as various other business analytic front-ends can be used instead.

In this section


Big Data analytics with SAP data

It is possible to perform Big Data analytics with SAP data and combine it with data from various sources, such as sensor measurements or social media.
Using Datavard Glue, SAP data can be replicated to a Data lake. Once in the Data lake, it can be accessed and processed by a Big Data analytics tool, as is Tableau, Power BI, Sisense or other. 


 

Key features and benefits

  • Modeling database tables in the target storage
  • Access to data in a Data lake from SAPGUI
  • Data extraction from SAP system into a Data lake or the other way round
  • Possibility to use Hadoop storage options for data extraction – HIVE and Impala
  • Integration of Data lake data into the SAP data flow
  • System integration with SAP Transport Management System


Modeling database table in a Data lake
Using Datavard Glue it is possible to create database tables in a similar way as in ABAP Dictionary (SE11). It is possible for the user to specify data types and based on this classification, the system creates a corresponding table on the Data lakeside.


Accessing data in a Data lake
Datavard Glue Data Browser enables to access data in a Data lake directly from SAPGUI. This functionality is similar to the ABAP Data Browser (SE16) and likewise, it is used only to view the content of tables.


Integration of Data lake data with SAP data flow
Datavard Glue InfoProvider enables to retrieve data from a Data lake and use it for BW reporting purposes. The user can map Data lake data to BW InfoObjects and thus simplify the integration between Data lake and complex SAP data flows.


Data extraction
Data extraction between SAP and Data lake is supported in both directions. To extract data from SAP, the user needs to create a table on the target storage from SAPGUI and trigger the extraction of data through a SAP job. 
Using the same principles data can be transferred also the other way round from the target storage into a SAP DDIC table. It is also possible to create a SAP table based on an existing target storage table and transfer data to it afterward. 


Integration with SAP Transport Management System
All objects created with Datavard Glue can be transferred using the SAP Transport Management System (TMS). Glue objects are created on a SAP development system. In the case of Glue tables, they are simultaneously created on a Data lake or another database connected to the SAP development system. With SAP TMS, the Datavard Glue objects metadata is transferred to the SAP Quality/Test/Production system. Using Glue TMS the user can trigger the creation of Glue objects on the target system based on the imported metadata. In the case of Glue tables, with later step also target storage tables are created on the Quality/Test/Production Data lake or another database.

Hadoop components utilized by Datavard Glue


Hadoop
The Apache Hadoop software library is a framework that allows distributed processing of large data sets across computer clusters using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle possible failures at the application layer. This means delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. 


HttpFS / WebHDFS 
HttpFS and WebHDFS are very similar to HTTP-based services. The main difference lies in the handling of redirection requested by a Hadoop NameNode. HttpFS handles the redirection itself, while WebHDFS requires the assistance of the client. The recommended Hadoop version is 2.6.0 or higher where major supportability improvements and bug fixes were applied to WebHDFS and HttpFS. 


Hive
Hive is a data warehousing infrastructure based on Apache Hadoop. Hadoop provides massive scale-out and faults tolerance capabilities for data storage and processing on commodity hardware.
Hive is designed to enable easy data summarization, ad-hoc querying, and analysis of large volumes of data. It provides SQL which enables users to perform ad-hoc querying, summarization, and data analysis easily. At the same time, the Hive's SQL gives users multiple places to integrate their own functionality to perform custom analysis, such as User Defined Functions (UDFs). 


Impala
The Apache Impala project provides high-performance, low-latency SQL queries on data stored in popular Apache Hadoop file formats. The fast response for queries enables interactive exploration and fine-tuning of analytic queries, rather than long batch jobs traditionally associated with SQL-on-Hadoop technologies.