Kafka streaming pipeline
Build pipelines that ingest data from event streaming sources like Kafka.
Set up Kafka
If you don’t have Kafka already setup, here is a quick guide on how to run and use Kafka locally:
Using Kafka locally
- In your terminal, clone this repository:
git clone https://github.com/wurstmeister/kafka-docker.git
. - Change directory into that repository:
cd kafka-docker
. - Edit the
docker-compose.yml
file to match this: - Start Docker:
If you encounter the error
command not found: docker-compose
, try running the following command instead: - Start a terminal session in the running container:
- Create a topic:
- List all available topics in Kafka instance:
- Start a producer on topic named
test
: - Send messages to the topic named
test
by typing the following JSON strings in the terminal (Note that Kafka messages in Mage are assumed to be in JSON format): - Open another terminal and start a consumer on the topic named
test
:
- The output should look something like this:
Original source of instructions.
Build streaming pipeline
Start Mage
Using Kafka locally in a Docker container
Start Mage using Docker. Run the following command to run Docker in network mode:
Change the environment variables argument depending on your cloud provider.
If the network named kafka-docker_default
doesn’t exist, create a new network:
Check that it exists:
If you can’t connect to Kafka locally in a Docker container using Mage in a Docker container, do the following:
- Clone Mage:
git clone https://github.com/mage-ai/mage-ai.git
. - Change directory into Mage:
cd mage-ai
. - Edit the
docker-compose.yml
file to match this: - Run the following script in your terminal:
./scripts/dev.sh
.
This will run Mage in development mode; which runs it in a Docker container
using docker compose
instead of docker run
.
Using Kafka without a Docker container
Start Mage using Docker. If you haven’t done this before, refer to the setup guide.
Create a new pipeline
-
Open Mage in your browser.
-
Click
+ New pipeline
, then selectStreaming
. -
Add a data loader block, select
Kafka
, and paste the following:- By default, the
bootstrap_server
is set tolocalhost:9092
. If you’re running Mage in a docker container, thebootstrap_server
should bekafka:9093
. - Messages are consumed from source in micro batch mode for better efficiency. The default batch size is 100. You can adjust the batch size in the source config.
- By default, the
-
Add a transformer block and paste the following:
-
Add a data exporter block, select
OpenSearch
and paste the following:- Change the
host
to match your OpenSearch domain’s endpoint. - Change the
index_name
to match the index you want to export data into.
- Change the
Test pipeline
Open the streaming pipeline you just created, and in the right side panel near
the bottom, click the button Execute pipeline
to test the pipeline.
You should see an output like this:
Publish messages using Python
-
Open a terminal on your local workstation.
-
Install
kafka-python
: -
Open a Python shell and write the following code to publish messages:
Once you run the code snippet above, go back to your streaming pipeline in Mage and the output should look like this:
Consume messages using Python
If you want to programmatically consume messages from a Kafka topic, here is a code snippet:
Run in production
- Create a trigger.
- Once trigger is created, click the
Start trigger
button at the top of the page to make the streaming pipeline active.
Was this page helpful?