(GLUE-1808) Azure Data Lake
This guide is describing a process required to establish connection from Datavard Storage Management to Azure Data Lake Storage. This storage can serve as a target for Glue extraction to CSVs, or as a storage layer for Big Data solution like Azure Databricks or Azure HDInsight.
Prerequisites
- Azure Data Lake Storage account
- Reuse Library transports imported
- SAP NetWeaver 7.10+
- Network connection between SAP system and Azure environment
Steps on Azure
These steps are usually done by the customer as a preparation for implementation.
Creating Application Registration
Application registration will be used for an authentication to ADLS. To create a new Application Registration, follow these steps:
- Go to Azure Active Directory → App registrations → New application registration
- Fill required fields and Click Create
- Note down Application ID, it will be required later on during Storage management configuration.
- Click Settings, Required permissions and add permission for Azure Data Lake.
- Click Keys and generate a new key. Note down the key as it will be used later on during configuration.
Creating landing folder
A directory where all the new files extracted from the SAP system will be located needs to be created upfront with correct permissions.
- Go to you ADLS resource Data explorer and click New Folder in desired location
- Go inside the new folder, click Access and add access to the App Registration created in previous section
Tenant
Unique identifier of your organization is also required for an authentication. To get this value, follow steps below:
- Go to Azure Active Directory → App Registrations → Endpoints
- Copy the OAUTH 2.0 AUTHORIZATION ENDPOINT and extract the ID part out of it. (https://login.microsoftonline.com/6fdc3117-ec29-4d73-8b33-028c513372/oauth2/authorize)
Steps on SAP
After preparation is complete on Azure side, we need to fill required information on SAP side to establish connection.
STRUST
Root certificate authority of Microsoft needs to be present in STRUST to establish secure SSL connection.
- With help of your internet browser, Copy CA public certificate into a file.
- In STRUST, import this certificate into SSL client (Anonymous) PSE.
- Go to transaction ICM and restart ICM service.
RFCs
Two RFCs need to be created in SM59 to successfully establish connection to ADLS.
- We start with creation of an RFC of type G for Microsoft Active directory with target host set to: login.microsoftonline.com. This RFC represents connection to authority server that will grant us an authentication token for ADLS. Set SSL to active and Certificate list to ANONYM.
- Create RFC type G for Microsoft Data lake. Set target host to your ADLS address (eg. clazuhdi02.azuredatalakestore.net) and path prefix to /webhdfs/v1/<Path to landing folder>. Set SSL to active and Certificate list to ANONYM.
Authentication profile
Authentication profile containing login information needs to be created in table /DVD/OAUTH_CONF.
OAUTH_PROFILE can be any value chosen by the customer to identify the profile used for authentication
CLIENT_ID is Application ID created in chapter Creating Application registration
CLIENT_SECRET is a key created in chapter Creating Application registration
GRANT_TYPE is fixed value client_credentials
RESOURCE is fixed value https://datalake.azure.net/
TENANT is identifier found in chapter Tenant
URL is left blank
Linking Authentication profile
Next steps is to link Authenticaton profile with RFCs created in table /DVD/HDP_AUT_OA2.
Setting Authentication method
Authentication method needs to be set to OAUTH2.0 in table /DVD/HDP_CUS_C
Creating storage in Datavard Storage Management
After configuration is done, we need to define a storage that will serve as a target for extraction.
- Go to transaction /DVD/SM_SETUP
- Go to Edit Mode and Click New Storage
- Create new storage of type HADOOP and fill RFC destination