0
0
dbtdata~30 mins

Model dependencies and parallelism in dbt - Mini Project: Build & Apply

Choose your learning style9 modes available
Model dependencies and parallelism
📖 Scenario: You are working on a data project using dbt to transform raw sales data into useful reports. Your project has multiple models that depend on each other. Understanding how to define these dependencies and run models in parallel will help you save time and avoid errors.
🎯 Goal: Build a simple dbt project with three models where one model depends on the other two. Learn how to define dependencies using ref() and understand how dbt runs models in parallel when possible.
📋 What You'll Learn
Create three dbt models: stg_sales.sql, stg_customers.sql, and fct_orders.sql
Use ref() to define dependencies in fct_orders.sql
Configure dbt to run models in parallel
Print the order in which models run to understand dependencies and parallelism
💡 Why This Matters
🌍 Real World
In real data projects, defining model dependencies ensures data is transformed in the correct order. Running models in parallel speeds up the workflow, saving time and computing resources.
💼 Career
Data engineers and analysts use dbt to build reliable data pipelines. Understanding dependencies and parallelism is key to optimizing data workflows and delivering timely insights.
Progress0 / 4 steps
1
Create initial dbt models
Create two dbt models named stg_sales.sql and stg_customers.sql. Each model should select all columns from their respective raw tables: raw_sales and raw_customers. Write the SQL code for both models.
dbt
Need a hint?

Use simple select * from statements for each model.

2
Create dependent model with ref()
Create a dbt model named fct_orders.sql that depends on stg_sales and stg_customers. Use ref('stg_sales') and ref('stg_customers') to join these two models on customer_id. Select all columns from stg_sales and the customer_name from stg_customers.
dbt
Need a hint?

Use {{ ref('model_name') }} to refer to other models inside your SQL.

3
Configure dbt to run models in parallel
Create a dbt_project.yml file and set threads to 4 to enable parallel model runs. This allows dbt to run independent models at the same time.
dbt
Need a hint?

Set threads: 4 at the root level in dbt_project.yml.

4
Run dbt models and observe execution order
Run the command dbt run --models stg_sales stg_customers fct_orders in your terminal. Observe and print the order in which models run. Write a print statement that shows the models run in this order: stg_sales and stg_customers run first (in parallel), then fct_orders runs last.
dbt
Need a hint?

Use a simple print() statement to show the order.