Azure Data Lake Storage Gen2

Only in Mage Pro.Try our fully managed solution to access this advanced feature.

Add credentials

Create a new pipeline or open an existing pipeline.
Expand the left side of your screen to view the file browser.
Scroll down and click on a file named io_config.yaml.
Enter the following keys and values under the key named default (you can have multiple profiles, add it under whichever is relevant to you)

Required: storage account name. For authentication you can use either Azure AD (service principal) or default credentials.

version: 0.1.1
default:
  AZURE_STORAGE_ACCOUNT_NAME: your_storage_account_name

  # Optional: Service principal (if not using DefaultAzureCredential)
  AZURE_CLIENT_ID: ...
  AZURE_CLIENT_SECRET: ...
  AZURE_TENANT_ID: ...

If AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, and AZURE_TENANT_ID are omitted, the client uses DefaultAzureCredential (Azure CLI, environment variables, or managed identity when running in Azure).

Using Python block

Create a new pipeline or open an existing pipeline.
Add a data loader or transformer block (the code snippet below is for a data loader).
Select Generic (no template).

Enter this code snippet (note: change the config_profile from default if you have a different profile):

from mage_ai.settings.repo import get_repo_path
from mage_ai.io.config import ConfigFileLoader
from mage_ai.io.azure_data_lake_storage import AzureDataLakeStorage
from os import path
from pandas import DataFrame

if 'data_loader' not in globals():
    from mage_ai.data_preparation.decorators import data_loader


@data_loader
def load_from_azure_data_lake(**kwargs) -> DataFrame:
    config_path = path.join(get_repo_path(), 'io_config.yaml')
    config_profile = 'default'

    container_name = '...'  # File system / container name
    file_path = '...'       # Path to file (e.g. folder/data.parquet)

    return AzureDataLakeStorage.with_config(
        ConfigFileLoader(config_path, config_profile)
    ).load(container_name, file_path)

Run the block.

Export data to Azure Data Lake Storage Gen2

from mage_ai.settings.repo import get_repo_path
from mage_ai.io.config import ConfigFileLoader
from mage_ai.io.azure_data_lake_storage import AzureDataLakeStorage
from os import path
from pandas import DataFrame

if 'data_exporter' not in globals():
    from mage_ai.data_preparation.decorators import data_exporter


@data_exporter
def export_data_to_azure_data_lake(df: DataFrame, **kwargs) -> None:
    config_path = path.join(get_repo_path(), 'io_config.yaml')
    config_profile = 'default'

    container_name = '...'
    file_path = '...'

    AzureDataLakeStorage.with_config(
        ConfigFileLoader(config_path, config_profile)
    ).export(df, container_name, file_path)

Supported formats

Azure Data Lake Storage Gen2 supports loading and exporting:

.csv
.json
.parquet

Permissions

Ensure your Azure AD app or managed identity has the appropriate role on the storage account with Data Lake Storage Gen2 (e.g. hierarchical namespace enabled), for example:

Storage Blob Data Contributor – read and write
Storage Blob Data Reader – read-only

Infrastructure

3rd party

Azure Data Lake Storage Gen2

Add credentials

Using Python block

Export data to Azure Data Lake Storage Gen2

Supported formats

Permissions

Infrastructure

3rd party

​Add credentials

​Using Python block

​Export data to Azure Data Lake Storage Gen2

​Supported formats

​Permissions

Add credentials

Using Python block

Export data to Azure Data Lake Storage Gen2

Supported formats

Permissions