(SM-2208) Azure Databricks & ADLS Gen 2 storage

Table of Contents:

1. Prerequisites on Azure portal

The entry point to Azure Databricks is https://portal.azure.com. From the home page navigate to the Databricks section.

Click

Your account must have Owner or Contributor privileges on Databrick's workspace to be able to access it.
By default, the Azure Databricks cluster is not running. Either create a new cluster or start the existing one (be sure to change the cluster filter to show all clusters).

 

 

In Advanced Options set up Spark config for a connection to ADLS Gen2 storage. The configuration is a set of a key:value pairs separated by single spaces. OAuth parameter values are the same as credentials used in ADLS.


Spark access to ADLS2 spark.hadoop.fs.azure.account.key.somestorageaccount.dfs.core.windows.net 2f+++++++++++++++++++++++++XRww==
OAuth access type spark.hadoop.dfs.adls.oauth2.access.token.provider.type ClientCredential
OAuth login endpoint spark.hadoop.dfs.adls.oauth2.refresh.url https://login.microsoftonline.com/6fdc3117-ec29-4d73-8b33-028c8c300872/oauth2/token
OAuth secret spark.hadoop.dfs.adls.oauth2.credential +gv+++++++++++++++++++++++++++++++++++++M74=
OAuth client ID spark.hadoop.dfs.adls.oauth2.client.id 74731c8c-7290-4998-8005-1d0670cbe909

 

Write down the server hostname, port, and JDBC URL which will be used in the SM storage definition.

 

To create a database, create a Notebook for the submission of SQL queries first.

Now you can run create database SQL query.

 

Create TOKEN for remote access in User Settings (top right corner icon).

Save token for later use in SAP configuration.

2. Setup Java connector

Java connector is a critical middle-ware component. Please follow the steps in this guide Java Connector Setup to set it up before you continue. 

 

3. Create ADLS Gen 2 storage

ADLS storage is required as storage for temporary data. Please follow this guide to create the storage .

4. Download Spark JDBC drivers

Download Spark JDBC driver from https://databricks.com/spark/odbc-driver-download

Upload the .jar file to SAP application server at /sapmnt/<SID>/global/security/dvd_conn/spark/
The file needs to be owned by <sid>adm:sapsys

5. Create Databricks storage

Go to transaction /DVD/SM_SETUP
Create new storage of type SM_TRS_MS

This storage use storage of type SM_TRS_MS, where more information about setup can be find .

 

Below is a sample of Azure Databricks & ADLS Gen 2 storage configuration:

General Tab

 

Referenced storage Storage ID of ADLS Gen 2 Gen Storage

Hadoop Tab

 

Drivers Tab

Security Tab

Advanced Tab