(GLUE-1812) User Guide
Big Data analytics with SAP data
It is possible to perform Big Data analytics with SAP data and combine it with data from various sources, such as sensor measurements or social media.
Using Datavard Glue, SAP data together with SAP authorizations can be replicated to Hadoop. Once in Hadoop it can be accessed and processed by a Big Data analytics tool, as is Tableau, Power BI, Sisense or other.
Key features and benefits
- Modeling database tables in Hadoop
- Access to data in Hadoop from SAPGUI
- Data extraction from SAP system into Hadoop or the other way round
- Possibility to use Hadoop storage options for data extraction – HIVE and Impala
- Integration of Hadoop data into the SAP data flow
- Possibility to use Hadoop script editors (Hive, Impala, Pig)
- System integration with SAP Transport Management System
Modeling database table in Hadoop
Using Datavard Glue it is possible to create database tables in a similar way as in ABAP Dictionary (SE11). It is possible for the user to specify data types, and based on this classification, the system creates a corresponding table on the Hadoop side.
Accessing data in Hadoop
Datavard Glue Data Browser enables to access data in Hadoop directly from SAPGUI. This functionality is similar to ABAP Data Browser (SE16) and likewise it is used only to view content of tables.
Integration of Hadoop data with SAP data flow
Datavard Glue InfoProvider enables to retrieve data from Hadoop and use it for BW reporting purposes. The user can map Hadoop data to BW InfoObjects, and thus simplify the integration between Hadoop and complex SAP data flows.
Data extraction
Data extraction between SAP and Hadoop is supported in both directions. To extract data from SAP, the user needs to create a Hive (Hadoop) table from SAPGUI and trigger the extraction of data through a SAP job. The data transfer is accomplished using a Hive or Impala storage.
Using the same principles data can be transferred also the other way round from Hadoop into a SAP DDIC table. It is also possible to create a SAP table based on an existing Hadoop table and transfer data to it afterwards.
Hadoop script editor
Datavard Glue Script Editor was created to bridge the gap between SAP and Hadoop environments. Through Datavard Glue Script Editor the user can create in SAPGUI Hadoop specific scripts and then trigger them on Hadoop.
Following script types are supported:
- Hive
- Impala
- Pig
Integration with SAP Transport Management System
All objects created with Datavard Glue can be transferred using SAP Transport Management System (TMS). Glue objects are created on a SAP development system and in the case of Glue tables they are simultaneously created on a Hadoop cluster or another database connected to the SAP development system. With SAP TMS, the Datavard Glue objects meta data are transferred to SAP Quality/Test/Production system. Using Glue TMS the user can trigger the creation of Glue objects on the target system based on the imported meta data. In the case of Glue tables are with later step also created Hadoop tables on the Hadoop Quality/Test/Production cluster or another database.
Hadoop components utilized by Datavard Glue
Hadoop
The Apache Hadoop software library is a framework that allows distributed processing of large data sets across computer clusters using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
HttpFS / WebHDFS
HttpFS and WebHDFS are very similar HTTP-based services. The main difference lies in handling of redirection requested by a Hadoop NameNode. HttpFS handles the redirection itself, while WebHDFS requires assistance of the client. The recommended Hadoop version is 2.6.0 or higher where major supportability improvements and bug fixes were applied to WebHDFS and HttpFS.
Hive
Hive is a data warehousing infrastructure based on Apache Hadoop. Hadoop provides massive scale out and fault tolerance capabilities for data storage and processing on commodity hardware.
Hive is designed to enable easy data summarization, ad-hoc querying and analysis of large volumes of data. It provides SQL which enables users to perform ad-hoc querying, summarization and data analysis easily. At the same time, Hive's SQL gives users multiple places to integrate their own functionality to perform custom analysis, such as User Defined Functions (UDFs).
Impala
The Apache Impala project provides high-performance, low-latency SQL queries on data stored in popular Apache Hadoop file formats. The fast response for queries enables interactive exploration and fine-tuning of analytic queries, rather than long batch jobs traditionally associated with SQL-on-Hadoop technologies.