ETL pipeline tutorial
Build a data pipeline that loads restaurant data, transforms it, then exports it to a DuckDB database. 🦆
Overview
In this tutorial, we’ll create a data pipeline with the following steps:
Load data from an online endpoint
We’ll use a Python block to load data from an online endpoint— a CSV file containing restaurant user transactions. We’ll also run a test to make sure the data is clean.
Clean column names and add a new column
We’ll use a Python block to transform the data by cleaning the columns and creating a new column, number of meals
, that counts the number of meals for each user.
Write the transformed data to a local DuckDB database
Finally, we’ll write the transformed data to a local DuckDB table.
If you haven’t created a Mage project before, follow the setup guide before starting this tutorial.
Quickstart
Want to dive in? Simply run the following command to clone a pre-built repo:
Then navigate to https://localhost:6789
in your browser to see the pipeline in action!
Tutorial
🎉 Congratulations!
You’ve successfully built an end-to-end ETL pipeline that loaded data, transformed it, and exported it to a database.
Now you’re ready to raid the dungeons and find magical treasures with your new powers!
If you have more questions or ideas, get real-time help in our live support Slack channel.
Was this page helpful?