• Latest:
    1. Authorize REST call in Jscript
    2. Approval Process for external users – Power Automate
    3. Power Automate – Tips and Tricks
  • +45 26 246 341 +91 22 4129 6111
  • engage@kalpavruksh.com
  • Home
  • Services
    • Product Development
    • Technology Consulting
    • Agentic AI & AI/ML Automation
  • Dynamics 365 CoE
  • Blog
  • Careers
  • Company
    • About Us
    • Contact Us
21st September, 2019
  • Category: D365 Customer Engagement
  • Comments: 0

Connect Azure Data Lake with Databricks

Introduction:

In this blog, we will learn how to connect Azure Data Lake with Databricks.

Pre-requisites:

1. A user with a Contributor role in Azure Subscription.
2. Azure data lake storage account.

Problem Statement:

We have a data store in Azure data lake in the CSV format and want to perform the analysis using Databricks service.

Steps:

1. Register an Azure AD Application.
2. Assign a Contributor role and Storage Blob Data Contributor to a registered application. Link Here. An only account administrator can assign a role.
3. Create an Azure Databricks resource.
4. In the Databricks, perform the below steps
In the left pane, select the Workspace. From the Workspace drop-down, select Create > Notebook.

5. In the Create Notebook dialog box, enter a name for the notebook. Select Scala as the language, and then select the Spark cluster that you created earlier.

Select Create.

6. It will preset a blank notebook for you.

Run the below code:

 
val configs = Map(
  "fs.azure.account.auth.type" -> "OAuth",
  "fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id" -> "",
  "fs.azure.account.oauth2.client.secret" -> "",
  "fs.azure.account.oauth2.client.endpoint" -> "https://login.microsoftonline.com//oauth2/token")

Please replace application-id, client-secret and tenant-id values.

Kindly get the values from Register an Azure AD Application.

7. Insert a new Call and run the below code.

This will Connect Databricks to Azure Data Lake.

dbutils.fs.mount(
  source = "abfss:// @.dfs.core.windows.net/",
  mountPoint = "/mnt/custommountpoint",
  extraConfigs = configs)

Kindly replace file-system-name and data-lake-storage-name with the actual values
storage-account-name: Name of the storage account.
file-system-name: Name of the container or file system
custommountpoint: any meaningful name

Please find the below screen capture, how to get a storage account name & file system name.

File system name

Select the block and hit Shift + Enter to execute the code.

8. Create a new Cell and run the below code.

val bookings = sqlContext.read.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load("/mnt/custommountpoint/ ")

Replace the file-name value with the actual name.

Our data is stored in CSV format.

Note: If you have received the following error then assign a Storage Blob Data Contributor role to the Application at Data Lake resource level.
StatusCode=403 StatusDescription=This request is not authorized to perform this operation using this permission.

Storage Account (Data Lake)> Access Control (IAM) > Assign a Role > Storage Blob Data Contributor.

9. Next step is to parquet the data.

Kindly enter the below code into a new Cell and hit Shift + Enter to execute the code.
Code:

bookings.write.mode("overwrite").parquet("bookings")

10. Read and Display the data in UI.
Kindly enter the below code into a new Cell and hit Shift + Enter to execute the code.

val data = sqlContext.read.parquet("bookings")
display(data)

You can execute many SQL commands against data to generate a report.
https://docs.databricks.com/spark/latest/spark-sql/index.html

Reference:

https://docs.databricks.com/spark/latest/spark-sql/index.html
https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake-gen2.html
https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/storage/blobs/data-lake-storage-use-databricks-spark.md

< Back to previous page

Leave a comment

Cancel reply

Your email address will not be published. Required fields are marked *

Recent posts

Authorize REST call in Jscript
18th December, 2023
Approval Process for external users – Power Automate
21st September, 2023
Power Automate – Tips and Tricks
27th June, 2023
Modern Commands and Power Fx
30th March, 2023
Plugin development – Tricks to avoid infinite loops
20th December, 2022

Contact us





    Contact Us

    Kalpavruksh Technologies Denmark A/S
    Store Kongensgade 68,
    1264 København
    +45 26 24 63 41

    Kalpavruksh Technologies USA
    29 Walter Hammond Pl,
    Waldwick NJ 07463

     

    Kalpavruksh Systems Pvt. Ltd.
    8th Floor, Technocity,
    Mahape, Navi Mumbai 400 710
    +91 22 4129 6111

    Kalpavruksh Technologies Deutschland GmbH
    Gosheimer Straße 26,
    78564 Wehingen

    Company

    • About Us
    • Blog
    • Careers
    • GDPR Compliance
    • Privacy Policy
    • Partners Privacy Policy
    • Contact us

    Microsoft Gold Partner



    Follow us

    © Copyright Kalpavruksh Technologies. 2025. All right reserved.
    Have any questions about our services, or just want to find out more about how we can help you reach your goals? Engage with us!