Get Started
- Overview
- Quickstart
- Design
- Abstractions
Concepts
- Backfills
- Global Hooks
- Blocks
- Data integration
- dbt
- Global data products
- Pipelines
- Pipeline runs
- Schedules and triggers
- Streaming
Reference
- Configuration
- Development
- External tools
- Monitoring
- Testing
- AI generated code
- Visualizations
- Updating Mage
Frequently Asked Questions
Here are some frequently asked questions about Mage and our best answers.
Mage is an open-source data pipeline tool for transforming and integrating data.
🧙 A mage is someone who uses magic. Advanced technology is indistinguishable from magic.
We’re on a mission to make AI technology more accessible by building data tools for engineers and scientists.
Find out more about our story: https://www.mage.ai/blog/mage-heros-journey-fantasy-epic-on-how-a-startup-rose-from-the-ashes
Our tool was built with data engineers and data scientists in mind, but is not limited to those roles. Other data professionals could find value in the tool.
You can quickly and easily get started by installing Mage using Docker (recommended), pip
, or conda
. Click here for details.
Mage is free as long as you are self-hosted (AWS, GCP, Azure, or Digital Ocean).
Our 4 core design principles that differentiate ourselves are:
-
Easy developer experience
-
Engineering best practices built-in
-
Data is a first-class citizen
-
Scaling is made simple
Features that set us apart (some of the others might eventually have these features):
- Mix and match SQL and Python in data pipeline tasks.
- UI/IDE for building and managing data pipelines.
- Data centric: we designed and built a pipeline engine ONLY for moving and transforming data. This makes it possible for us to make datasets a 1st class citizen; enabling native features such as partitioning, versioning, backfilling, data validation, testing, and data quality monitoring.
- Extensible: we designed and built the tool with developers in mind, making sure it’s really easy to add new functionality to the source code or through plug-ins.
- Scalable: the tool can handle very, very large datasets while transforming the data or charting it.
- Production ready: when you build your data pipeline, it runs exactly the same in development as it does in production. Deploying the tool and managing the infrastructure in production is very easy and simple, unlike Airflow.
- Modular: every block/cell you write is a standalone file that is interoperable; meaning it can be used in other pipelines or in other code bases.
Check out our blog Mage vs. Fivetran.
Check out our blog Mage vs Airbyte.
Mage provides an interactive notebook with built-in engineering best practices for building pipelines, which makes prototyping and building production-ready pipelines much easier.
Mage supports writing pipelines in multiple languages which include Python, SQL, and R.
Mage supports multiple types of pipelines natively such as:
- Standard batch pipelines
- Data integration pipelines
- Streaming pipelines
- Spark pipelines
- DBT pipelines
We currently support SQL, Python, R, and PySpark. Coming soon: Spark SQL.
Yes! Here is a step-by-step tutorial to use Mage with Spark on EMR.
Sagemaker is used to train machine learning models and serve them via api.
Mage is an engine for running data pipelines that can move and transform data. That data can then be stored anywhere (e.g. S3) and used to train models in Sagemaker.
Databricks provides infrastructure to run Spark. They also provide notebooks that can run your code in Spark as well.
Mage can execute your code in a Spark cluster, managed by AWS, GCP, or even Databricks.
Here is a doc to help you set up alerting for pipeline status updates in Slack.
We love and welcome community contributions! Here is a doc to get you started.
To request features, add a “Feature request” using the New issue
button in GitHub from this link, or join our feature-request Slack channel.
Can’t find what you’re looking for? Ask a question here or join our slack for additional support!
Was this page helpful?