Frequently Asked Questions
Here are some frequently asked questions about Mage and our best answers.
What is Mage?
What is Mage?
Mage is an open-source data pipeline tool for transforming and integrating data.
🧙 A mage is someone who uses magic. Advanced technology is indistinguishable from magic.
We’re on a mission to make AI technology more accessible by building data tools for engineers and scientists.
Find out more about our story: https://www.mage.ai/blog/mage-heros-journey-fantasy-epic-on-how-a-startup-rose-from-the-ashes
Who is the ideal user for this tool?
Who is the ideal user for this tool?
Our tool was built with data engineers and data scientists in mind, but is not limited to those roles. Other data professionals could find value in the tool.
How difficult is Mage to setup?
How difficult is Mage to setup?
You can quickly and easily get started by installing Mage using Docker (recommended), pip
, or conda
. Click here for details.
How much does Mage cost?
How much does Mage cost?
Mage is free as long as you are self-hosted (AWS, GCP, Azure, or Digital Ocean).
How is Mage’s data pipeline engine software different from Airflow, Dagster, etc?
How is Mage’s data pipeline engine software different from Airflow, Dagster, etc?
Our 4 core design principles that differentiate ourselves are:
-
Easy developer experience
-
Engineering best practices built-in
-
Data is a first-class citizen
-
Scaling is made simple
Features that set us apart (some of the others might eventually have these features):
- Mix and match SQL and Python in data pipeline tasks.
- UI/IDE for building and managing data pipelines.
- Data centric: we designed and built a pipeline engine ONLY for moving and transforming data. This makes it possible for us to make datasets a 1st class citizen; enabling native features such as partitioning, versioning, backfilling, data validation, testing, and data quality monitoring.
- Extensible: we designed and built the tool with developers in mind, making sure it’s really easy to add new functionality to the source code or through plug-ins.
- Scalable: the tool can handle very, very large datasets while transforming the data or charting it.
- Production ready: when you build your data pipeline, it runs exactly the same in development as it does in production. Deploying the tool and managing the infrastructure in production is very easy and simple, unlike Airflow.
- Modular: every block/cell you write is a standalone file that is interoperable; meaning it can be used in other pipelines or in other code bases.
What’s the difference between Mage and Fivetran?
What’s the difference between Mage and Fivetran?
Check out our blog Mage vs. Fivetran.
What’s the difference between Mage and Airbyte?
What’s the difference between Mage and Airbyte?
Check out our blog Mage vs Airbyte.
What’s the difference between Mage and Prefect?
What’s the difference between Mage and Prefect?
Mage provides an interactive notebook with built-in engineering best practices for building pipelines, which makes prototyping and building production-ready pipelines much easier.
Mage supports writing pipelines in multiple languages which include Python, SQL, and R.
Mage supports multiple types of pipelines natively such as:
- Standard batch pipelines
- Data integration pipelines
- Streaming pipelines
- Spark pipelines
- DBT pipelines
What languages does Mage support?
What languages does Mage support?
We currently support SQL, Python, R, and PySpark. Coming soon: Spark SQL.
Does Mage integrate with Spark?
Does Mage integrate with Spark?
Yes! Here is a step-by-step tutorial to use Mage with Spark on EMR.
What’s the difference between Mage and Sagemaker?
What’s the difference between Mage and Sagemaker?
Sagemaker is used to train machine learning models and serve them via api.
Mage is an engine for running data pipelines that can move and transform data. That data can then be stored anywhere (e.g. S3) and used to train models in Sagemaker.
What’s the difference between Mage and Databricks?
What’s the difference between Mage and Databricks?
Databricks provides infrastructure to run Spark. They also provide notebooks that can run your code in Spark as well.
Mage can execute your code in a Spark cluster, managed by AWS, GCP, or even Databricks.
How do I send pipeline notifications to Slack?
How do I send pipeline notifications to Slack?
Here is a doc to help you set up alerting for pipeline status updates in Slack.
How can I contribute or request features?
How can I contribute or request features?
We love and welcome community contributions! Here is a doc to get you started.
To request features, add a “Feature request” using the New issue
button in GitHub from this link, or join our feature-request Slack channel.
Can’t find what you’re looking for? Ask a question here or join our slack for additional support!