Create a new pipeline
Each pipeline is represented by a YAML file in a folder namedpipelines/
under the Mage
project directory.
For example, if your project is named demo_project
and your pipeline is named etl_demo
then
you’ll have a folder structure that looks like this:
demo_project/pipelines/
directory. Name this new folder after the name
of your pipeline.
Add 2 files in this new folder:
__init__.py
metadata.yaml
metadata.yaml
file, add the following content:
Change
etl_demo
to whatever name you’re using for your new pipeline.Sample pipeline metadata content
This sample pipelinemetadata.yaml
will produce the following block dependencies:

metadata.yaml
sections
Pipeline attributes
An array of blocks that are in the pipeline.
Unique name of the pipeline.
The type of pipeline. Currently available options are:
databricks
integration
pyspark
python
(most common)streaming
Unique identifier of the pipeline. This UUID must be unique across all pipelines.
Optional description of what the pipeline does.
Pipeline level executor type. Supported values:
ecs
gcp_cloud_run
azure_container_instance
k8s
local_python
(most common)pyspark
Number of concurrent executors to run the pipeline. Used in streaming pipeline.
Optional configuration specific to the selected executor type.
Refer to the following documentation for executor-specific options:
Retry configuration at the pipeline level.
See documentation for details.
Configuration for pipeline notification messages (e.g., on failure or success).
See documentation for details.
Concurrency settings for block execution within the pipeline.
See documentation for details.
block_run_limit
: Maximum number of blocks that can run in parallel.pipeline_run_limit
pipeline_run_limit_all_triggers
on_pipeline_run_limit_reached
Whether to cache block output in memory during execution.
If true, runs all blocks in a single process or k8s pod.
Block attributes
An array of block UUIDs that depend on this current block.
These downstream blocks will have access to this current block’s data output.
The method for running this block of code. Currently available options are:
ecs
gcp_cloud_run
azure_container_instance
k8s
local_python
(most common)pyspark
Optional configuration specific to the selected executor type.
Refer to the following documentation for executor-specific options:
Programming language used by the block. Supported values:
python
(most common)r
sql
yaml
Unique name of the block.
The type of block. Currently available options are:
chart
custom
(most common)data_exporter
data_loader
dbt
scratchpad
sensor
transformer
type
is data_loader
, then the file must be in the [project_name]/data_loaders/
folder. It can be
nested in any number of subfolders.An array of block UUIDs that this current block depends on.
These upstream blocks will pass its data output to this current block.
Unique identifier of the block. This UUID must be unique within the current pipeline. The UUID
corresponds to the name of the file for this block.For example, if the UUID is
load_data
and the language
is python
, then the file name
will be load_data.py
.Retry configuration at the block level.
See documentation for details.