This is a guide for using Databricks Spark cluster with Mage.
Try our fully managed solution to access this advanced feature.
Besides supporting running Spark pipelines in AWS EMR cluster and standalone Spark cluster, Mage also supports running Spark pipelines in Databricks cluster.
Here is an overview of the steps required to use Mage with Databricks Cluster:
If you get stuck, run into problems, or just want someone to walk you through these steps, please join our Slack
Set up a Databricks workspace and cluster following the docs:
Contact Mage team to update your Mage Pro cluster to use Mage databricks docker image.
Set the following environment variables in your Mage Pro cluster to enable connectivity with your Databricks workspace:
DATABRICKS_HOST
The base URL of your Databricks workspace.
Example: https://<your-databricks-instance>.cloud.databricks.com
DATABRICKS_TOKEN
A personal access token (PAT) used for authenticating with Databricks.
You can generate this token in the Databricks UI by navigating to:
Settings > Developer > Access Tokens.
DATABRICKS_CLUSTER_ID
The unique identifier for the Databricks cluster where queries will be executed.
You can find this in your Databricks workspace under: Compute > Clusters.
Refer to Databricks documentation for detailed steps on retrieving the cluster ID.
New pipeline
in the /pipelines
page.+ Data loader
, then Base template (generic)
to add a new data loader
block.+ Data exporter
, then Base template (generic)
to add a new data
exporter block.Check the table in your Unity Catalog to verify whether the data is written to it correctly.