Overview

In this tutorial, we’ll create a data pipeline with the following steps:

1

Load data from an online endpoint

We’ll use a Python block to load data from an online endpoint— a CSV file containing restaurant user transactions. We’ll also run a test to make sure the data is clean.

2

Clean column names and add a new column

We’ll use a Python block to transform the data by cleaning the columns and creating a new column, number of meals, that counts the number of meals for each user.

3

Write the transformed data to a local DuckDB database

Finally, we’ll write the transformed data to a local DuckDB table.

If you haven’t created a Mage project before, follow the setup guide before starting this tutorial.

Quickstart

Want to dive in? Simply run the following command to clone a pre-built repo:

git clone https://github.com/mage-ai/etl-demo mage-etl-demo \
&& cd mage-etl-demo \
&& cp dev.env .env && rm dev.env \
&& docker compose up

Then navigate to https://localhost:6789 in your browser to see the pipeline in action!

Tutorial

🎉 Congratulations!

You’ve successfully built an end-to-end ETL pipeline that loaded data, transformed it, and exported it to a database.

Now you’re ready to raid the dungeons and find magical treasures with your new powers!

Lightning mage

If you have more questions or ideas, get real-time help in our live support Slack channel.