If you haven’t setup a project before, check out the setup guide before starting.
http://localhost:3000/pipelines
This page will show all the pipelines in your project.
Core abstraction: Pipeline
A pipeline contains references to all the blocks of code you want to run, charts for visualizing data, and organizes the dependency between each block of code.
Learn more about projects and pipelines here.
From this page, you can also create a new pipeline by clicking the
[+ New pipeline]
button.
Creating a new pipeline will take you to the Pipeline edit page; a notebook-like experience for adding blocks, creating dependencies between blocks, testing code, and visualizing data with charts.
Learn more about the Notebook for building data pipelines
http://localhost:3000/pipeline-runs
View all the runs for every pipeline in your current project.
Core abstraction: Run
A run record stores information about when it was started, its status, when it was completed, any runtime variables used in the execution of the pipeline or block, etc.
http://localhost:3000/pipelines/[uuid]
This page contains all the information and history for a single pipeline:
http://localhost:3000/pipelines/example_pipeline/triggers
This page shows all the active and inactive triggers for a single pipeline.
Core abstraction: Trigger
A trigger is a set of instructions that determine when or how a pipeline should run.
http://localhost:3000/pipelines/[uuid]/triggers/[id]/edit
Create a new trigger for this pipeline by clicking the [+ Create]
button
near the top of the page.
You can configure the trigger to run the pipeline on a schedule, when an event occurs, or when an API is called.
Core abstraction: Schedule
A schedule type trigger will instruct the pipeline to run after a start date and on a set interval.
Core abstraction: Event
An event type trigger will instruct the pipeline to run whenever a specific event occurs.
Core abstraction: API
An API trigger will instruct the pipeline to run whenever a specific API is called.
On this page, you can start or pause the trigger. Starting the trigger will make it active. Pausing the trigger will prevent it from running the pipeline.
If you have other triggers for this pipeline, pausing 1 trigger may not stop the pipeline from running since other triggers can also run the pipeline.
You can also edit the trigger after creating it by clicking the
[Edit trigger]
button.
http://localhost:3000/pipelines/example_pipeline/runs
View the pipeline runs and block runs for a pipeline.
Core abstraction: Run
A run record stores information about when it was started, its status, when it was completed, any runtime variables used in the execution of the pipeline or block, etc.
WIP
http://localhost:3000/pipelines/example_pipeline/logs
Browse all logs for a pipeline. You can search and filter logs by log level, block type, block UUID, and more.
Core abstraction: Log
A log is a file that contains system output information.
WIP
You can monitor many metrics for each of your pipelines and blocks. Soon, you’ll be able to monitor aggregate metrics across all pipelines and blocks.
Read more here.
You can limit the concurrency of the block execution to reduce resource consumption.
You can configure the maximum number of concurrent block runs in project’s metadata.yaml
via queue_config
.
The default value of concurrency is 20.
Try our fully managed solution to access this advanced feature.
You can configure the global block run concurrency by block uuid in project’s metadata.yaml
via concurrency_config
.
Example
In the example above:
block_uuid_1
can run at most 1 concurrent execution.block_uuid_2
can run up to 5 concurrent executions.Limitations
This feature is not supported in the following scenarios:
run_pipeline_in_one_process
is set to true
in the pipeline’s metadata.yaml
, global concurrency is not enforced.You can edit the concurrency_config
in each pipeline’s metadata.yaml file to enforce pipeline level concurrency.
Here is the example:
The default value of block_run_limit
and pipeline_run_limit
can be set via environment variables: CONCURRENCY_CONFIG_BLOCK_RUN_LIMIT
and CONCURRENCY_CONFIG_PIPELINE_RUN_LIMIT
block_run_limit
: limit the concurrent blocks runs in one pipeline run.pipeline_run_limit
: limit the concurrent pipeline runs in one pipeline trigger.pipeline_run_limit_all_triggers
: limit the concurrent pipeline runs across all triggers in a pipeline.on_pipeline_run_limit_reached
: choose whether to wait
or skip
when the pipeline run limit is reached.Mage automatically persists the output of block runs on disk. You can specify the path or the storage for block output variables in the following ways.
MAGE_DATA_DIR
environment variable. If you use Mage docker image, this environment
variable is set to /home/src/mage_data
by default.MAGE_DATA_DIR
environment variable is not set, you can set the variables_dir
path in project’s metadata.yaml. Here is
an example:
remote_variables_dir
path in project’s
metadata.yaml.
GOOGLE_APPLICATION_CREDENTIALS
environment variable to the path of your
application_default_credentials.json
file with your Google Cloud credentials and the GOOGLE_CLOUD_PROJECT
environment variable to your Google Cloud project ID.If you want to clean up the old variables in your variable storage, you can set the variables_retention_period
config in project’s metadata.yaml. The valid period should end with “d”, “h”, or “w”.
Example config:
After configuring variables_retention_period
in project’s metadata.yaml, you can run the following command
to clean up old variables:
By default, Mage persists block output on disk. In pipeline’s metadata.yaml, you have the option to configure the pipeline to cache the block output in memory instead of persisting the block output on disk. The feature is only supported in standard batch pipeline (without dynamic blocks) for now.
Example config:
Keeps track of recently viewed pipelines so you can easily navigate back to them
in the Pipelines Dashboard (/pipelines
). Adds a RECENTLY VIEWED
tab to the Pipelines
Dashboard that lists these pipelines.