Run dbt-spark against a pySpark session.
The following procedure demonstrates how to run dbt-spark
with a pySpark session.
-
Build a Mage docker image with Spark following the instructions given at Build Mage docker image with Spark environment.
-
Run the following command in your terminal to start Mage using docker:
docker run -it --name mage_spark -p 6789:6789 -v $(pwd):/home/src mage_spark \
/app/run_app.sh mage start demo_project
- Create a new pipeline with a name
dbt_spark
, and add aScratchpad
to test out the connection with PySpark, with the following code:
from pyspark.sql import SparkSession
import os
spark = SparkSession.builder.master(os.getenv('SPARK_MASTER_HOST', 'local')).getOrCreate()
spark.sql("show databases;")
It should return results similar to the following when running:
[Stage 0:>
(0 + 1) / 1]
namespace
0 default
- Click the
Terminal
icon on the right side of the Mage UI, and create a dbt projectspark_demo
, with the following commands:
root@488dc9529cf3:/home/src# cd demo_project/dbt
root@488dc9529cf3:/home/src# dbt init spark_demo -s
root@488dc9529cf3:/home/src# touch spark_demo/profiles.yml
- On the left side of the page in the file browser, expand the folder
demo_project/dbt/spark_demo/
. Click the file namedprofiles.yml
, and add the following settings to this file:
spark_demo:
target: dev
outputs:
dev:
type: spark
method: session
schema: default
host: local
-
Save the
profiles.yml
file by pressingCommand (⌘) + S
, then close the file by pressing the X button on the right side of the file namedbt/spark_demo/profiles.yml
. -
Click the button
dbt model
, and choose the optionNew model
. Entermodel_1
as theModel name
, andspark_demo/models/example
as the folder location. -
In the dbt block named
model_1
, next to the labelTarget
at the top, choosedev
in the dropdown list. You can also checkManually enter target
, and enterdev
in the input field. -
Paste the following SQL into the dbt block
model_1
:
{{ config(materialized='table') }}
with source_data as (
select 1 as id
union all
select null as id
)
select *
from source_data
Click the Compile & preview
button to execute this new model, which would
generate the results similar to the following:
Was this page helpful?