(SM-1911) Azure Data Lake Gen2
This guide describes a process required to establish a connection from Datavard Storage Management to Azure Data Lake Storage Gen2.
Prerequisites
- Azure Data Lake Storage gen2 account
- A network connection between an SAP system and Azure environment
Azure storage configuration
You should perform these steps before the implementation.
Application registration
Application registration is used for authentication to ADLS. Application registration is only required if AAD (oAuth) authentication is used. An alternative is to use SAS token which is easier.
To create a new registration of an application, follow these steps:
- Go to Azure Active Directory > App registrations > New application registration
2. Fill the required fields and Click Create.
3. Note down Application ID and Directory ID, as it will be required later during the Storage management configuration.
4. Click Certificates & secrets and generate a new client secret. Note down the secret, as it will be used later during the configuration.
Creating a landing folder
Create a folder where all the new files extracted from the SAP system will be located and set the correct permissions for this folder.
- Go to your Microsoft Azure Storage explorer and identify a folder that will be used with the storage.
- Make sure that the App registration that was created previously has rwx access and default privileges checked for the directory
- The App registration also needs privileges to execute on ALL parent directories and filesystem
SAP system configuration
After preparation is complete on the Azure side, fill the required information on the SAP side to establish a connection.
STRUST
The root certificate authority of Microsoft needs to be present in STRUST to establish a secure SSL connection.
- With the help of your internet browser, copy the CA public certificate into the file, as shown in the figure below.
2. In STRUST, import this certificate into SSL Client (Anonymous) PSE.
3. Go to the transaction SMICM and restart the ICM services as shown on the figure.
Storage RFC
Create RFC type G for Azure Data Lake Gen2 primary endpoint.
Set Target Host to your ADLS address (e.g. dvdadls2.dfs.core.windows.net) and Path Prefix to /<filesystem>/<Path to landing folder>.
- Set SSL to "Active" and Certificate list to "ANONYM".
- Set HTTP Version to 1.1 and Compression to Active
Authentication methods
You can use one of authentication methods for ADLS gen2 - oAuth 2.0 or SAS token.
oAuth 2.0
Authentication RFC
Start with the creation of an RFC of the type G for Microsoft Active directory. This RFC represents a connection to the authority server that grants an authentication token for ADLS.
- Set Target Host to: login.microsoftonline.com.
- Set SSL to "Active" and Certificate list to "ANONYM".
Authentication profile
The authentication profile contains login information, which you should create in the table /DVD/OAUTH_CONF.
OAUTH_PROFILE you may choose any value to identify a profile used for the authentication
CLIENT_ID is an Application ID created in the section Creating Application Registration
CLIENT_SECRET is a key created in the section Creating Application Registration. Can be hashed by report /DVD/XOR_GEN.
GRANT_TYPE is the fixed value "client_credentials"
RESOURCE is the fixed value "https://storage.azure.net/"
TENANT is an identifier described in the section Tenant
URL is left blank
Linking authentication profile
The next step is to link the authentication profile with RFCs created in the table /DVD/HDP_AUT_OA2.
Setting the authentication method
The authentication method needs to be set to OAUTH2.0 in the table /DVD/HDP_CUS_C
Creating a storage in Datavard Storage Management
After the configuration is complete, you need to define the storage that serves as a target for the extraction.
- Go to the transaction /DVD/SM_SETUP.
- Go to Edit Mode and Click New Storage.
- Create new storage of the type ADLS_GEN2 and fill the RFC destination.
Specify the following parameters:
- HTTP RFC Destination – Storage RFC destination created previously
- Use oAuth token - Mark as checked to use oAuth
- HTTP call repeat - Number of retries if request to ADLS Gen2 is not successful.
- Repeat delay (seconds) - Seconds between repetitions - if not filled (0), default value 3 is used.
SAS Token
The SAS Token can be used for authentication to Azure Data Lake Storage Gen2. However, this authentication method has fewer security management options.
If you decide to use SAS Token, you can skip all steps in Azure Storage Configuration besides creating the directory and you can also skip steps in oAuth section.
To generate the SAS token, go to Azure portal (see the screenshot below for more information).
Creating a storage in Datavard Storage Management
After the configuration is complete, you need to define the storage that serves as a target for the extraction.
- Go to the transaction /DVD/SM_SETUP.
- Go to Edit Mode and Click New Storage.
- Create new storage of the type ADLS_GEN2 and fill the RFC destination.
Specify the following parameters:
- HTTP RFC Destination – Storage RFC destination created previously
- Use SAS token - Mark as checked to use oAuth
- HTTP call repeat - Number of retries if request to ADLS Gen2 is not successful.
- Repeat delay (seconds) - Seconds between repetitions - if not filled (0), default value 3 is used.
SAS token – Shared Access Signature Token generated in Azure Portal (please refer to the next chapter for more information) and subsequently hashed (optional) by report /DVD/XOR_GEN.
SAS token value alerts:
- Remove a question mark '?' character at the beginning of the generated SAS token.
- If a value ends with '%3D', replace it with the equals sign '='.
Advanced
Failover storage
Failover (Read-only) storage enables to utilize Azure Storage Redundancy, e.g. read-access geo-redundant storage (RA-GRS) or read-access geo-zone-redundant storage (RA-GZRS), to improve high availability/disaster recovery. For more information about Azure replication strategies, please refer to official Microsoft documentation: https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy.
When the primary storage is unavailable, i.e. connection check fails, the application will automatically switch to Failover storage for data read from another data center in a secondary region. Follow the procedure below to enable it on the SAP side.
Enable Failover storage storage
- Run transaction /DVD/RL_SETT_EXPERT
- Specify “SM” in Tool name parameter and execute (F8)
Find the parameter “Failover storage”, click on "Edit" button and set the value to "X".
After that, additional parameters appear in Storage Management settings (Tcode /DVD/SM_SETUP), where you can specify HTTP Destination to your failover storage and SAS token.
Failover storage RFC
Create RFC type G for Azure Data Lake Gen2 secondary endpoint.
Set Target Host to your secondary ADLS address - it is the same as for primary storage, but appends the suffix –secondary (e.g. dvdadls2-secondary.dfs.core.windows.net) and Path Prefix to /<filesystem>/<Path to landing folder>.
- Set SSL to "Active" and Certificate list to "ANONYM".
- Set HTTP Version to 1.1 and Compression to Active