> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mage.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Spark

## Add credentials

1. Create a new pipeline or open an existing pipeline.
2. Expand the left side of your screen to view the file browser.
3. Scroll down and click on a file named `io_config.yaml`.
4. Enter the following keys and values under the key named `default` (you can
   have multiple profiles, add it under whichever is relevant to you)

For a **local Spark session**:

```yaml theme={"system"}
version: 0.1.1
default:
  SPARK_HOST: local
  SPARK_METHOD: session
  SPARK_SCHEMA: default
```

For a **remote Spark cluster**, set `SPARK_HOST` to the Spark master URL (for example, `spark://host:7077`, `local[*]`, or another valid `SparkSession.builder.master(...)` value). You can also set `SPARK_METHOD` (e.g. `session`) and `SPARK_SCHEMA` to control how the session is created and which default database/schema is used.

<br />

## Using Python block

1. Create a new pipeline or open an existing pipeline.
2. Add a data loader, transformer, or data exporter block (the code snippet
   below is for a data loader).
3. Select `Generic (no template)`.
4. Enter this code snippet (note: change the `config_profile` from `default` if
   you have a different profile):

```python theme={"system"}
from mage_ai.settings.repo import get_repo_path
from mage_ai.io.config import ConfigFileLoader
from mage_ai.io.spark import Spark
from os import path
from pandas import DataFrame

if 'data_loader' not in globals():
    from mage_ai.data_preparation.decorators import data_loader


@data_loader
def load_data_from_spark(**kwargs) -> DataFrame:
    query = 'SELECT 1'
    config_path = path.join(get_repo_path(), 'io_config.yaml')
    config_profile = 'default'

    loader = Spark.with_config(ConfigFileLoader(config_path, config_profile))
    return loader.load(query)
```

5. Run the block.

### Export a dataframe to Spark

```python theme={"system"}
from mage_ai.settings.repo import get_repo_path
from mage_ai.io.config import ConfigFileLoader
from mage_ai.io.spark import Spark
from os import path
from pandas import DataFrame

if 'data_exporter' not in globals():
    from mage_ai.data_preparation.decorators import data_exporter


@data_exporter
def export_data_to_spark(df: DataFrame, **kwargs) -> None:
    config_path = path.join(get_repo_path(), 'io_config.yaml')
    config_profile = 'default'
    table_name = 'your_table_name'
    database = 'default'

    loader = Spark.with_config(ConfigFileLoader(config_path, config_profile))
    loader.export(df, table_name=table_name, database=database, if_exists='replace')
```

## Notes

* Spark runs in-process; ensure PySpark and any required cluster dependencies are installed in your Mage environment.
* For local development, `SPARK_HOST: local` typically creates a session with `SparkSession.builder.master('local').getOrCreate()`.
* Use `SPARK_SCHEMA` to set the default database/schema for queries and exports.
