These are the fundamental concepts that Mage uses to operate.
- Event (WIP)
- Metric (WIP)
- Partition (WIP)
- Version (WIP)
- Backfill (WIP)
- Service (WIP)
A project is like a repository on GitHub; this is where you write all your code.
⌄ 📁 demo/
Code in a project can be shared across the entire project.
You can create a new project by running the following command:
docker run -it -p 6789:6789 -v $(pwd):/home/src \
mageai/mageai mage init [project_name]
mage init [project_name]
A pipeline contains references to all the blocks of code you want to run, charts for visualizing data, and organizes the dependency between each block of code.
This is what it could look like in the notebook UI:
A block is a file with code that can be executed independently or within a pipeline.
Blocks can depend on each other. A block won’t start running in a pipeline until all its upstream dependencies are met.
There are 5 types of blocks.
- 1.Data loader
- 3.Data exporter
def load_data_from_api() -> DataFrame:
url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'
response = requests.get(url)
return pd.read_csv(io.StringIO(response.text), sep=',')
Each block file is stored in a folder that matches its respective type (e.g. transformers are stored in
A sensor is a block that continuously evaluates a condition until it’s met or until a period of time has elapsed.
If there is a block with a sensor as an upstream dependency, that block won’t start running until the sensor has evaluated its condition successfully.
Sensors can check for anything. Examples of common sensors check for:
- Does a table exist (e.g.
- Does a partition of a table exist (e.g.
ds = 2022-12-31)?
- Does a file in a remote location exist (e.g.
- Has another pipeline finished running successfully?
- Has a block from another pipeline finished running successfully?
- Has a pipeline run or block run failed?
Here is an example of a sensor that will keep checking to see if pipeline
transform_usershas finished running successfully for the current execution date:
from mage_ai.orchestration.run_status_checker import check_status
def check_condition(**kwargs) -> bool:
NoteThis example is using a helper function called
check_statusthat handles the logic for retrieving the status of a pipeline run for
transform_userson the current execution date.
Every block produces data after it's been executed. These are called data products in Mage.
Data validation occurs whenever a block is executed.
Additionally, each data product produced by a block can be automatically partitioned, versioned, and backfilled.
Some examples of data products produced by blocks:
- 📋 Dataset/Table in a database, data warehouse, etc.
- 🖼️ Image
- 📹 Video
- 📝 Text file
- 🎧 Audio file
A trigger is a set of instructions that determine when or how a pipeline should run. A pipeline can have 1 or more triggers.
There are 3 types of triggers:
A schedule-type trigger will instruct the pipeline to run after a start date and on a set interval.
Currently, the frequency pipelines can be scheduled for include:
- Run exactly once
- Every N minutes (coming soon)
An event-type trigger will instruct the pipeline to run whenever a specific event occurs.
For example, you can have a pipeline start running when a database query is finished executing or when a new object is created in Amazon S3 or Google Storage.
You can also trigger a pipeline using your own custom event by making a
POSTrequest to the
http://localhost/api/eventsendpoint with a custom event payload.
An API-type trigger will instruct the pipeline to run after a specific API call is made.
You can make a POST request to an endpoint provided in the UI when creating or editing a trigger. You can optionally include runtime variables in your request payload.
A run record stores information about when it was started, its status, when it was completed, any runtime variables used in the execution of the pipeline or block, etc.
Every time a pipeline or a block is executed (outside of the notebook while building the pipeline and block), a run record is created in a database.
There are 2 types of runs:
This contains information about the entire pipeline execution.
Every time a pipeline is executed, each block in the pipeline will be executed and potentially create a block run record.
A log is a file that contains system output information.
It’s created whenever a pipeline or block is ran.
Logs can contain information about the internal state of a run, text that is outputted by loggers or
Logs are stored on disk wherever Mage is running. However, you can configure where you want log files written to (e.g. Amazon S3, Google Storage, etc).