Only in Mage Pro.Try our fully managed solution to access this advanced feature.
Set up
Here is an overview of the steps required to use Mage with Databricks Cluster:- Set up Databricks cluster
- Use Mage databricks docker image
- Configure environment variables
- Sample pipeline with PySpark code
- Verify everything worked
1. Set up Databricks cluster
Set up a Databricks workspace and cluster following the docs:2. Use Mage databricks docker image
Contact Mage team to update your Mage Pro cluster to use Mage databricks docker image.3. Configure environment variables
Set the following environment variables in your Mage Pro cluster to enable connectivity with your Databricks workspace:-
DATABRICKS_HOST
The base URL of your Databricks workspace.
Example:https://<your-databricks-instance>.cloud.databricks.com
-
DATABRICKS_TOKEN
A personal access token (PAT) used for authenticating with Databricks.
You can generate this token in the Databricks UI by navigating to:
Settings > Developer > Access Tokens. -
DATABRICKS_CLUSTER_ID
The unique identifier for the Databricks cluster where queries will be executed.
You can find this in your Databricks workspace under: Compute > Clusters.
Refer to Databricks documentation for detailed steps on retrieving the cluster ID.
4. Sample pipeline with PySpark code
- Create a new pipeline by clicking
New pipeline
in the/pipelines
page. - Open the pipeline’s metadata.yaml file and set the config.
- Click
+ Data loader
, thenBase template (generic)
to add a new data loader block. - Paste the following sample code in the new data loader block:
- Click “Run code” button to run the block.
- Click
+ Data exporter
, thenBase template (generic)
to add a new data exporter block. - Paste the following sample code in the new data exporter block:
- Click “Run code” button to run the block.