(Glue-2105) User Guide
Big Data analytics with SAP data
It is possible to perform Big Data analytics with SAP data and combine it with data from various sources, such as sensor measurements or social media.
Using Datavard Glue, SAP data can be replicated to a Data lake. Once in the Data lake, it can be accessed and processed by a Big Data analytics tool, as is Tableau, Power BI, Sisense or other.
Key features and benefits
- Modeling database tables in the target storage
- Access to data in a Data lake from SAPGUI
- Data extraction from SAP system into a Data lake or the other way round
- Possibility to use Hadoop storage options for data extraction – HIVE and Impala
- Integration of Data lake data into the SAP data flow
- System integration with SAP Transport Management System
Modeling database table in a Data lake
Using Datavard Glue it is possible to create database tables in a similar way as in ABAP Dictionary (SE11). It is possible for the user to specify data types and based on this classification, the system creates a corresponding table on the Data lakeside.
Accessing data in a Data lake
Datavard Glue Data Browser enables to access data in a Data lake directly from SAPGUI. This functionality is similar to the ABAP Data Browser (SE16) and likewise, it is used only to view the content of tables.
Integration of Data lake data with SAP data flow
Datavard Glue InfoProvider enables to retrieve data from a Data lake and use it for BW reporting purposes. The user can map Data lake data to BW InfoObjects and thus simplify the integration between Data lake and complex SAP data flows.
Data extraction
Data extraction between SAP and Data lake is supported in both directions. To extract data from SAP, the user needs to create a table on the target storage from SAPGUI and trigger the extraction of data through a SAP job.
Using the same principles data can be transferred also the other way round from the target storage into a SAP DDIC table. It is also possible to create a SAP table based on an existing target storage table and transfer data to it afterward.
Integration with SAP Transport Management System
All objects created with Datavard Glue can be transferred using the SAP Transport Management System (TMS). Glue objects are created on a SAP development system. In the case of Glue tables, they are simultaneously created on a Data lake or another database connected to the SAP development system. With SAP TMS, the Datavard Glue objects metadata is transferred to the SAP Quality/Test/Production system. Using Glue TMS the user can trigger the creation of Glue objects on the target system based on the imported metadata. In the case of Glue tables, with later step also target storage tables are created on the Quality/Test/Production Data lake or another database.
Hadoop components utilized by Datavard Glue
Hadoop
The Apache Hadoop software library is a framework that allows distributed processing of large data sets across computer clusters using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle possible failures at the application layer. This means delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
HttpFS / WebHDFS
HttpFS and WebHDFS are very similar to HTTP-based services. The main difference lies in the handling of redirection requested by a Hadoop NameNode. HttpFS handles the redirection itself, while WebHDFS requires the assistance of the client. The recommended Hadoop version is 2.6.0 or higher where major supportability improvements and bug fixes were applied to WebHDFS and HttpFS.
Hive
Hive is a data warehousing infrastructure based on Apache Hadoop. Hadoop provides massive scale-out and faults tolerance capabilities for data storage and processing on commodity hardware.
Hive is designed to enable easy data summarization, ad-hoc querying, and analysis of large volumes of data. It provides SQL which enables users to perform ad-hoc querying, summarization, and data analysis easily. At the same time, the Hive's SQL gives users multiple places to integrate their own functionality to perform custom analysis, such as User Defined Functions (UDFs).
Impala
The Apache Impala project provides high-performance, low-latency SQL queries on data stored in popular Apache Hadoop file formats. The fast response for queries enables interactive exploration and fine-tuning of analytic queries, rather than long batch jobs traditionally associated with SQL-on-Hadoop technologies.