Skip to main content

Add credentials

  1. Create a new pipeline or open an existing pipeline.
  2. Expand the left side of your screen to view the file browser.
  3. Scroll down and click on a file named io_config.yaml.
  4. Enter the following keys and values under the key named default (you can have multiple profiles, add it under whichever is relevant to you)
For a local Spark session:
version: 0.1.1
default:
  SPARK_HOST: local
  SPARK_METHOD: session
  SPARK_SCHEMA: default
For a remote Spark cluster, set SPARK_HOST to the Spark master URL (for example, spark://host:7077, local[*], or another valid SparkSession.builder.master(...) value). You can also set SPARK_METHOD (e.g. session) and SPARK_SCHEMA to control how the session is created and which default database/schema is used.

Using Python block

  1. Create a new pipeline or open an existing pipeline.
  2. Add a data loader, transformer, or data exporter block (the code snippet below is for a data loader).
  3. Select Generic (no template).
  4. Enter this code snippet (note: change the config_profile from default if you have a different profile):
from mage_ai.settings.repo import get_repo_path
from mage_ai.io.config import ConfigFileLoader
from mage_ai.io.spark import Spark
from os import path
from pandas import DataFrame

if 'data_loader' not in globals():
    from mage_ai.data_preparation.decorators import data_loader


@data_loader
def load_data_from_spark(**kwargs) -> DataFrame:
    query = 'SELECT 1'
    config_path = path.join(get_repo_path(), 'io_config.yaml')
    config_profile = 'default'

    loader = Spark.with_config(ConfigFileLoader(config_path, config_profile))
    return loader.load(query)
  1. Run the block.

Export a dataframe to Spark

from mage_ai.settings.repo import get_repo_path
from mage_ai.io.config import ConfigFileLoader
from mage_ai.io.spark import Spark
from os import path
from pandas import DataFrame

if 'data_exporter' not in globals():
    from mage_ai.data_preparation.decorators import data_exporter


@data_exporter
def export_data_to_spark(df: DataFrame, **kwargs) -> None:
    config_path = path.join(get_repo_path(), 'io_config.yaml')
    config_profile = 'default'
    table_name = 'your_table_name'
    database = 'default'

    loader = Spark.with_config(ConfigFileLoader(config_path, config_profile))
    loader.export(df, table_name=table_name, database=database, if_exists='replace')

Notes

  • Spark runs in-process; ensure PySpark and any required cluster dependencies are installed in your Mage environment.
  • For local development, SPARK_HOST: local typically creates a session with SparkSession.builder.master('local').getOrCreate().
  • Use SPARK_SCHEMA to set the default database/schema for queries and exports.