> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mage.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Databricks

> This is a guide for using Databricks Spark cluster with Mage.

export const ProOnly = ({button = 'Get started for free', description = 'Try our fully managed solution to access this advanced feature.', source = 'documentation', title = 'Only in Mage Pro.'}) => <a href={`https://cloud.mage.ai/sign-up?source=${source}`} className="block my-4 px-5 py-4 overflow-hidden rounded-xl flex gap-3 border border-emerald-500/20 bg-emerald-50/50 dark:border-emerald-500/30 dark:bg-emerald-500/10" target="_blank">
    <div style={{
  display: 'flex',
  alignItems: 'center',
  width: '100%'
}}>
      <div className="text-sm prose min-w-0 text-emerald-900 dark:text-emerald-200" style={{
  flex: 1
}}>
        {title}
        <p className="normal">{description}</p>
      </div>

      <div> </div>

      <div>
        <ProButton label={button} href={`https://cloud.mage.ai/sign-up?source=${source}`} />
      </div>
    </div>
  </a>;

export const ProButton = ({href, label = 'Get started with Mage Pro for free', source = 'documentation'}) => <div style={{
  height: 32,
  position: 'relative'
}}>
    <a target="_blank" className="group px-4 py-1.5 relative inline-flex items-center text-sm font-medium rounded-full" href={href ?? `https://cloud.mage.ai/sign-up?source=${source}`}>
      <span className="absolute inset-0 bg-primary-dark dark:bg-primary-light/10 border-primary-light/30 rounded-full dark:border group-hover:opacity-[0.9] dark:group-hover:border-primary-light/60">
      </span>

      <div className="mr-0.5 space-x-2.5 flex items-center">
        <span class="z-10 text-white dark:text-primary-light">
          {label}
        </span>

        <svg width="3" height="24" viewBox="0 -9 3 24" class="h-5 rotate-0 overflow-visible text-white/90 dark:text-primary-light">
          <path d="M0 0L3 3L0 6" fill="none" stroke="currentColor" stroke-width="1.5" stroke-linecap="round"></path>
        </svg>
      </div>
    </a>
  </div>;

<ProOnly source="databricks" />

Besides supporting running Spark pipelines in AWS EMR cluster and standalone Spark cluster, Mage also supports
running Spark pipelines in Databricks cluster.

## Set up

Here is an overview of the steps required to use Mage with Databricks Cluster:

1. [Set up Databricks cluster](#1-set-up-databricks-cluster)
2. [Use Mage databricks docker image](#2-use-mage-databricks-docker-image)
3. [Configure environment variables](#3-configure-environment-variables)
4. [Sample pipeline with PySpark code](#4-sample-pipeline-with-pyspark-code)
5. [Verify everything worked](#5-verify-everything-worked)

If you get stuck, run into problems, or just want someone to walk you through
these steps, please join our [Slack](https://www.mage.ai/chat)

### 1. Set up Databricks cluster

Set up a Databricks workspace and cluster following the docs:

* [Create a workspace using the account console](https://docs.databricks.com/administration-guide/account-settings-e2/workspaces.html)
* [Create a cluster](https://docs.databricks.com/clusters/create-cluster.html)

### 2. Use Mage databricks docker image

Contact Mage team to update your Mage Pro cluster to use Mage databricks docker image.

### 3. Configure environment variables

Set the following environment variables in your Mage Pro cluster to enable connectivity with your Databricks workspace:

* **`DATABRICKS_HOST`**\
  The base URL of your Databricks workspace.\
  Example: `https://<your-databricks-instance>.cloud.databricks.com`

* **`DATABRICKS_TOKEN`**\
  A personal access token (PAT) used for authenticating with Databricks.\
  You can generate this token in the Databricks UI by navigating to:\
  **Settings** > **Developer** > **Access Tokens**.

* **`DATABRICKS_CLUSTER_ID`**\
  The unique identifier for the Databricks cluster where queries will be executed.\
  You can find this in your Databricks workspace under: **Compute** > **Clusters**.\
  Refer to [Databricks documentation](https://docs.databricks.com/aws/en/workspace/workspace-details#cluster-url-and-id) for detailed steps on retrieving the cluster ID.

### 4. Sample pipeline with PySpark code

1. Create a new pipeline by clicking `New pipeline` in the `/pipelines` page.
2. Open the pipeline's metadata.yaml file and set the config.
   ```
   cache_block_output_in_memory: true
   run_pipeline_in_one_process: true
   ```
3. Click `+ Data loader`, then `Base template (generic)` to add a new data loader
   block.
4. Paste the following sample code in the new data loader block:
5. Click "Run code" button to run the block.

```python theme={"system"}
from databricks.connect import DatabricksSession

if 'data_loader' not in globals():
    from mage_ai.data_preparation.decorators import data_loader
if 'test' not in globals():
    from mage_ai.data_preparation.decorators import test


@data_loader
def load_data(*args, **kwargs):
    spark = DatabricksSession.builder.remote().getOrCreate()

    data = [("John", 28), ("Anna", 23), ("Mike", 35)]
    columns = ["Name", "Age"]

    df = spark.createDataFrame(data, columns)
    return df


@test
def test_output(output, *args) -> None:
    """
    Template code for testing the output of the block.
    """
    assert output is not None, 'The output is undefined'
```

1. Click `+ Data exporter`, then `Base template (generic)` to add a new data
   exporter block.
2. Paste the following sample code in the new data exporter block:
3. Click "Run code" button to run the block.

```python theme={"system"}
if 'data_exporter' not in globals():
    from mage_ai.data_preparation.decorators import data_exporter


@data_exporter
def export_data(df, *args, **kwargs):
    df.write.format("delta").mode("append").saveAsTable("user_table")

    return df
```

### 5. Verify everything worked

Check the table in your Unity Catalog to verify whether the data is written to it correctly.
