(SM 2011) Hadoop
- SNP
Hadoop is an open-source distributed processing framework that manages data processing and storage for big data applications running in clustered systems. It can handle various forms of structured and unstructured data, giving users more flexibility for collecting, processing, and analyzing data than relational databases and data warehouses provide.
For more information, see the page Apache Hadoop.
Supported versions
Hadoop components
The Hadoop ecosystem refers to the various components of the Apache Hadoop software library as well as the accessories and tools provided by the Apache Software Foundation for these types of software projects and to the ways that they work together.
In Datavard Storage Management, the following components of the Hadoop ecosystem are used:
Component | Supported versions |
---|---|
Hive | 1.1.0 and higher |
Impala | 2.3.0 and higher |
WebHDFS | any |
Hadoop distributions
Commercial distributions of Hadoop are offered by various vendors of big data platforms.
Datavard Storage Management basically supports all the distributions which use Hive and HDFS (or HDFS-like storage).
Verified vendors:
- Cloudera (5.x and higher)
- Hortonworks (min 2.6)
- MapR
- Azure Databricks
Datavard is certified by
Setup
See the chapter (SM 2011) Hadoop Storage Setup for instructions on the Hadoop storage setup.
Detailed technical information
See the chapter (SM 2011) SAP-Hadoop Communication for more detailed technical information regarding communication between SAP and Hadoop.