Mage orchestration

Setup

If you haven’t setup a project before, check out the setup guide before starting.


Pipelines

http://localhost:3000/pipelines

This page will show all the pipelines in your project.

Core abstraction: Pipeline

A pipeline contains references to all the blocks of code you want to run, charts for visualizing data, and organizes the dependency between each block of code.

Pipelines

Learn more about projects and pipelines here.

From this page, you can also create a new pipeline by clicking the

[+ New pipeline] button.

Creating new pipeline

Creating a new pipeline will take you to the Pipeline edit page; a notebook-like experience for adding blocks, creating dependencies between blocks, testing code, and visualizing data with charts.

Learn more about the Notebook for building data pipelines


Pipeline runs

http://localhost:3000/pipeline-runs

View all the runs for every pipeline in your current project.

Core abstraction: Run

A run record stores information about when it was started, its status, when it was completed, any runtime variables used in the execution of the pipeline or block, etc.

Pipeline runs


Pipeline detail

http://localhost:3000/pipelines/[uuid]

This page contains all the information and history for a single pipeline:

  1. Triggers
  2. Runs
  3. Logs

Triggers

http://localhost:3000/pipelines/example_pipeline/triggers

This page shows all the active and inactive triggers for a single pipeline.

Core abstraction: Trigger

A trigger is a set of instructions that determine when or how a pipeline should run.

Pipeline detail


Create trigger

http://localhost:3000/pipelines/[uuid]/triggers/[id]/edit

Create a new trigger for this pipeline by clicking the [+ Create] button near the top of the page.

You can configure the trigger to run the pipeline on a schedule, when an event occurs, or when an API is called.

Core abstraction: Schedule

A schedule type trigger will instruct the pipeline to run after a start date and on a set interval.


Core abstraction: Event

An event type trigger will instruct the pipeline to run whenever a specific event occurs.


Core abstraction: API

An API trigger will instruct the pipeline to run whenever a specific API is called.

Trigger create


Trigger detail

On this page, you can start or pause the trigger. Starting the trigger will make it active. Pausing the trigger will prevent it from running the pipeline.

If you have other triggers for this pipeline, pausing 1 trigger may not stop the pipeline from running since other triggers can also run the pipeline.


Trigger detail

You can also edit the trigger after creating it by clicking the

[Edit trigger] button.


Runs

http://localhost:3000/pipelines/example_pipeline/runs

View the pipeline runs and block runs for a pipeline.

Core abstraction: Run

A run record stores information about when it was started, its status, when it was completed, any runtime variables used in the execution of the pipeline or block, etc.

Pipeline detail runs

Retry run

WIP


Logs

http://localhost:3000/pipelines/example_pipeline/logs

Browse all logs for a pipeline. You can search and filter logs by log level, block type, block UUID, and more.

Core abstraction: Log

A log is a file that contains system output information.

Pipeline detail logs


Backfill

WIP


Monitor

You can monitor many metrics for each of your pipelines and blocks. Soon, you’ll be able to monitor aggregate metrics across all pipelines and blocks.

Monitoring

Read more here.

Concurrency

You can limit the concurrency of the block execution to reduce resource consumption.

Global concurrency

You can configure the maximum number of concurrent block runs in project’s metadata.yaml via queue_config.

queue_config:
  concurrency: 100

The default value of concurrency is 20.

Pipeline level concurrency

You can edit the concurrency_config in each pipeline’s metadata.yaml file to enforce pipeline level concurrency. Here is the example:

concurrency_config:
  block_run_limit: 5
  pipeline_run_limit: 3
  • block_run_limit: limit the concurrent blocks runs in one pipeline run.
  • pipeline_run_limit: limit the concurrent pipeline runs in one pipeline trigger.
  • pipeline_run_limit_all_triggers: limit the concurrent pipeline runs across all trigers in a pipeline.
  • on_pipeline_run_limit_reached: choose whether to wait or skip when the pipeline run limit is reached.

Variable storage

Mage automatically persists the output of block runs on disk. You can specify the path or the storage for block output variables in the following ways.

  • Specify the data directory path via MAGE_DATA_DIR environment variable. If you use Mage docker image, this environment variable is set to /home/src/mage_data by default.
  • If the MAGE_DATA_DIR environment variable is not set, you can set the variables_dir path in project’s metadata.yaml. Here is an example:
    variables_dir: /home/src/mage_data
    
  • You can also use an external storage to store the block output variables by specifying the remote_variables_dir path in project’s metadata.yaml.
    • AWS S3 storage:
      variables_dir: /home/src/mage_data
      remote_variables_dir: s3://bucket/path_prefix
      
    • Google Cloud Storage:
      variables_dir: /home/src/mage_data
      remote_variables_dir: gs://bucket/path_prefix
      
      When using GCS for your remote variables directory, if you run into a “Your default credentials were not found” error, you may need to set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of your application_default_credentials.json file with your Google Cloud credentials and the GOOGLE_CLOUD_PROJECT environment variable to your Google Cloud project ID.

Variable retention

If you want to clean up the old variables in your variable storage, you can set the variables_retention_period config in project’s metadata.yaml. The valid period should end with “d”, “h”, or “w”.

Example config:

variables_retention_period: 30d

After configuring variables_retention_period in project’s metadata.yaml, you can run the following command to clean up old variables:

mage clean-cached-variables [project_path]

Cache block output in memory

By default, Mage persists block output on disk. In pipeline’s metadata.yaml, you have the option to configure the pipeline to cache the block output in memory instead of persisting the block output on disk. The feature is only supported in standard batch pipeline (without dynamic blocks) for now.

Example config:

cache_block_output_in_memory: true
run_pipeline_in_one_process: true