After completing data transformations, utilize the data exporter blocks to either load the processed data or store a machine learning model in an external data storage system.
Mage natively supports integration with a variety of data storage systems. However, these integrations often
require specific configurations in both the exporter block and the io_config.yml file to ensure seamless operation.
The io_config.yml file typically includes connection details such as host, port, database name, username, and password.
Meanwhile, the exporter block needs to be configured with the appropriate export parameters, such as target table names,
schema details, and conflict resolution strategies.
These blocks are designed to facilitate the movement of transformed data or trained models to external systems.
Configuration parameters might include destination paths, file formats, table names, schemas, and update strategies.
Most data exporters include a config_profile parameter set to 'default' by default. This parameter can be customized
to use different configuration profiles if you have multiple profiles or have renamed them.
Supported Data Storage Systems:
Mage supports a wide range of storage systems including PostgreSQL, MySQL, AWS S3, Google Cloud Storage, Azure Blob Storage, and many more.
Each system may have unique requirements and configurations to ensure compatibility and optimal performance.
Configuration in io_config.yml:
This file serves as the central configuration hub for defining connection parameters.
Typical parameters include:
host: The server address of the storage system.
port: The port number for the connection.
database: The name of the target database or data storage container.
username and password: Authentication credentials.
Additional parameters as required by specific storage systems (e.g., SSL settings, API tokens).
Configure the io_config.yml file to connect your Mage pipeline to a snowflake data warehouse. While optional,
depending on how your Snowflake DW is configured you may need to enter all information into the .yml file.
It’s recommended to store sensitive information as Secrets. See the general Secrets documentation
for more information.
Enter information for the following in the Data Exporter block
table_name - requires developers enter the name of their destination table
database - requires developers enter the name of their destination data base
schema - requires developers enter the name of their destination schema
All other information is handled in the ‘io_config.yml’ file.
Example Code:
from mage_ai.settings.repo import get_repo_pathfrom mage_ai.io.config import ConfigFileLoaderfrom mage_ai.io.snowflake import Snowflakefrom pandas import DataFramefrom os import pathif'data_exporter'notinglobals():from mage_ai.data_preparation.decorators import data_exporter@data_exporterdefexport_data_to_snowflake(df: DataFrame,**kwargs)->None:""" Template for exporting data to a Snowflake warehouse. Specify your configuration settings in'io_config.yaml'. Docs: https://docs.mage.ai/design/data-loading#snowflake""" table_name ='your_table_name' database ='your_database_name' schema ='your_schema_name' config_path = path.join(get_repo_path(),'io_config.yaml') config_profile ='default'with Snowflake.with_config(ConfigFileLoader(config_path, config_profile))as loader: loader.export( df, table_name, database, schema, if_exists='replace',# Specify resolution policy if table already exists)
Configure the io_config.yml file to connect your Mage pipeline to a Azure Blob Storage. Configure some Secrets and enter them into io_config.yml file.
If you need more information on entering secrets see this documentation.
Enter information for the following in the Data Exporter block
container_name - requires developers enter the name of their destination container
blob_path - requires developers enter the name of their destination blob path
All other information is handled in the io_config.yml file.
Example Code:
from mage_ai.settings.repo import get_repo_pathfrom mage_ai.io.azure_blob_storage import AzureBlobStoragefrom mage_ai.io.config import ConfigFileLoaderfrom pandas import DataFramefrom os import pathif'data_exporter'notinglobals():from mage_ai.data_preparation.decorators import data_exporter@data_exporterdefexport_data_to_azure_blob_storage(df: DataFrame,**kwargs)->None:""" Template for exporting data to a Azure Blob Storage. Specify your configuration settings in'io_config.yaml'. Docs: https://docs.mage.ai/design/data-loading""" config_path = path.join(get_repo_path(),'io_config.yaml') config_profile ='default' container_name ='your_container_name' blob_path ='your_blob_path' AzureBlobStorage.with_config(ConfigFileLoader(config_path, config_profile)).export( df, container_name, blob_path,)
Configure the io_config.yml file to connect your Mage pipeline to a PostgreSQL database. Configure some Secrets and enter them into io_config.yml file.
If you need more information on entering secrets see this documentation.
POSTGRES_CONNECT_TIMEOUT:10POSTGRES_DBNAME: postgresPOSTGRES_SCHEMA: public # OptionalPOSTGRES_USER: usernamePOSTGRES_PASSWORD: passwordPOSTGRES_HOST: hostnamePOSTGRES_PORT:5432
If exporting from Docker to an external machine use the host.docker.internal for POSTGRES_HOST:
Enter information for the following in the Data Exporter block
schema_name - requires developers enter the name of their destination container
table_name - requires developers enter the name of their destination blob path
All other information is handled in the io_config.yml file.
Example Code:
from mage_ai.settings.repo import get_repo_pathfrom mage_ai.io.config import ConfigFileLoaderfrom mage_ai.io.postgres import Postgresfrom pandas import DataFramefrom os import pathif'data_exporter'notinglobals():from mage_ai.data_preparation.decorators import data_exporter@data_exporterdefexport_data_to_postgres(df: DataFrame,**kwargs)->None:""" Template for exporting data to a PostgreSQL database. Specify your configuration settings in'io_config.yaml'. Docs: https://docs.mage.ai/design/data-loading#postgresql""" schema_name ='your_schema_name'# Specify the name of the schema to export data to table_name ='your_table_name'# Specify the name of the table to export data to config_path = path.join(get_repo_path(),'io_config.yaml') config_profile ='default'with Postgres.with_config(ConfigFileLoader(config_path, config_profile))as loader: loader.export( df, schema_name, table_name, index=False,# Specifies whether to include index in exported table if_exists='replace',# Specify resolution policy if table name already exists)
Unlike other data exporters, delta lake exporters do not currently configure through the io_config.yml file.
They contain the necessary configurations within the exporter block itself. Lets break that down.
Storage Options
'AWS_ACCESS_KEY_ID': Your AWS access key ID.
'AWS_SECRET_ACCESS_KEY': Your AWS secret access key.
'AWS_REGION': The AWS region where your S3 bucket is located.
'AWS_S3_ALLOW_UNSAFE_RENAME': This option allows unsafe rename operations on S3, which might be necessary for some workflows.
Remember Secrets can be stored in Mage’s internal Secrets Manager, .YAML files, or sync directly with Cloud Secret Managers.
Additional Configurations
uri: The S3 URI where the Delta Table is stored.
Example Code:
from deltalake.writer import write_deltalakeif'data_exporter'notinglobals():from mage_ai.data_preparation.decorators import data_exporter@data_exporterdefexport_data(df,*args,**kwargs):""" Export data to a Delta Table Docs: https://delta-io.github.io/delta-rs/python/usage.html#writing-delta-tables""" storage_options ={'AWS_ACCESS_KEY_ID':'','AWS_SECRET_ACCESS_KEY':'','AWS_REGION':'','AWS_S3_ALLOW_UNSAFE_RENAME':'true',} uri ='s3://[bucket]/[key]' write_deltalake( uri, data=df, mode='append',# append or overwrite overwrite_schema=False,# set True to alter the schema when overwriting partition_by=[], storage_options=storage_options,)
By correctly configuring these components, you can effectively streamline the data loading process into your chosen storage system,
whether it be a relational database, a data lake, or a machine learning model repository.