(SM-2408) Azure Databricks

Table of Contents:

1. Prerequisites on Azure portal

For the creation of Azure Databricks workspace and cluster refer to Get started with Azure Databricks - Azure Databricks documentation.

Access the Databricks workspace, to create a new computing cluster or start an existing one.

Your account must have Owner or Contributor privileges on Databrick's workspace to be able to access it.
By default, the Azure Databricks cluster is not running. Either create a new cluster or start the existing one (be sure to change the cluster filter to show all clusters).

In Advanced Options of the cluster set up Spark config for a connection to ADLS Gen 2 storage. The configuration is a set of a key:value pairs separated by single spaces. OAuth parameter values are the same as credentials used in ADLS.


Spark access to ADLS2 spark.hadoop.fs.azure.account.key.somestorageaccount.dfs.core.windows.net 2f+++++++++++++++++++++++++XRww==
OAuth access type spark.hadoop.dfs.adls.oauth2.access.token.provider.type ClientCredential
OAuth login endpoint spark.hadoop.dfs.adls.oauth2.refresh.url https://login.microsoftonline.com/6fdc3117-ec29-4d73-8b33-028c8c300872/oauth2/token
OAuth secret spark.hadoop.dfs.adls.oauth2.credential +gv+++++++++++++++++++++++++++++++++++++M74=
OAuth client ID spark.hadoop.dfs.adls.oauth2.client.id 74731c8c-7290-4998-8005-1d0670cbe909

 

Write down the server hostname, port, and JDBC URL which will be used in the SM storage definition.

 

To create a database run the SQL query create the database if not exists <DatabaseName> in the notebook of the workspace.

 

Create TOKEN for remote access in User Settings (top right corner icon).

Save token for later use in SAP configuration.

 

2. Networking

To enable communication of the SAP system with the Azure Databricks host, the following host/port needs to be reachable from the SAP system:

Port

Address

Port

Address

443

<ServerHostname>.azuredatabricks.net

433

<StorageAccountName>.blob.core.windows.net 

<StorageAccountName>.blob.core.windows.net is internal DBFS storage location, automatically generated by Azure Databricks. It is used for query results, which are larger than 1MB. Name of the internal storage can be found in JSON view of the Azure Databricks service in Azure portal.

3. Setup Java connector

Java connector is a critical middle-ware component. Follow the steps in this guide Java Connector Setup to set it up before you continue. 

 

4. Create ADLS Gen 2 storage

ADLS storage is required as storage for temporary data. Follow this guide to create the storage (SM-2408) Azure Data Lake Gen2.

5. Download Spark JDBC drivers

Download Databricks JDBC driver from JDBC Drivers Download – Databricks

Upload the .jar file to the SAP application server at /sapmnt/<SID>/global/security/dvd_conn/databricks

The file needs to be owned by <sid>adm:sapsys

6. Create Databricks storage

Go to transaction /DVD/SM_SETUP
Create new storage of type SM_TRS_MS

This storage uses the storage type SM_TRS_MS. More information about the setup can be found in (SM-2408) Hadoop Storage Setup.

 

Examples of Azure Databricks & ADLS Gen 2 storage configuration are shown below:

General Tab

 

Referenced storage Storage ID of ADLS Gen 2 Gen Storage

Hadoop Tab

 

Drivers Tab

 

Security Tab

 

Advanced Tab