Databricks
This is a guide for using Databricks Spark cluster with Mage.
Try our fully managed solution to access this advanced feature.
Besides supporting running Spark pipelines in AWS EMR cluster and standalone Spark cluster, Mage also supports running Spark pipelines in Databricks cluster.
Set up
Here is an overview of the steps required to use Mage with Databricks Cluster:
- Set up Databricks cluster
- Use Mage databricks docker image
- Configure environment variables
- Sample pipeline with PySpark code
- Verify everything worked
If you get stuck, run into problems, or just want someone to walk you through these steps, please join our Slack
1. Set up Databricks cluster
Set up a Databricks workspace and cluster following the docs:
2. Use Mage databricks docker image
Contact Mage team to update your Mage Pro cluster to use Mage databricks docker image.
3. Configure environment variables
Set the following environment variables in your Mage Pro cluster to enable connectivity with your Databricks workspace:
-
DATABRICKS_HOST
The base URL of your Databricks workspace.
Example:https://<your-databricks-instance>.cloud.databricks.com
-
DATABRICKS_TOKEN
A personal access token (PAT) used for authenticating with Databricks.
You can generate this token in the Databricks UI by navigating to:
Settings > Developer > Access Tokens. -
DATABRICKS_CLUSTER_ID
The unique identifier for the Databricks cluster where queries will be executed.
You can find this in your Databricks workspace under: Compute > Clusters.
Refer to Databricks documentation for detailed steps on retrieving the cluster ID.
4. Sample pipeline with PySpark code
- Create a new pipeline by clicking
New pipeline
in the/pipelines
page. - Open the pipeline’s metadata.yaml file and set the config.
- Click
+ Data loader
, thenBase template (generic)
to add a new data loader block. - Paste the following sample code in the new data loader block:
- Click “Run code” button to run the block.
- Click
+ Data exporter
, thenBase template (generic)
to add a new data exporter block. - Paste the following sample code in the new data exporter block:
- Click “Run code” button to run the block.
5. Verify everything worked
Check the table in your Unity Catalog to verify whether the data is written to it correctly.
Was this page helpful?