dbtdata~30 mins

Model dependencies and parallelism in dbt - Mini Project: Build & Apply

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Model dependencies and parallelism

📖 Scenario: You are working on a data project using dbt to transform raw sales data into useful reports. Your project has multiple models that depend on each other. Understanding how to define these dependencies and run models in parallel will help you save time and avoid errors.

🎯 Goal: Build a simple dbt project with three models where one model depends on the other two. Learn how to define dependencies using ref() and understand how dbt runs models in parallel when possible.

📋 What You'll Learn

Create three dbt models: stg_sales.sql, stg_customers.sql, and fct_orders.sql

Use ref() to define dependencies in fct_orders.sql

Configure dbt to run models in parallel

Print the order in which models run to understand dependencies and parallelism

💡 Why This Matters

🌍 Real World

In real data projects, defining model dependencies ensures data is transformed in the correct order. Running models in parallel speeds up the workflow, saving time and computing resources.

💼 Career

Data engineers and analysts use dbt to build reliable data pipelines. Understanding dependencies and parallelism is key to optimizing data workflows and delivering timely insights.

Progress0 / 4 steps

Create initial dbt models

Create two dbt models named stg_sales.sql and stg_customers.sql. Each model should select all columns from their respective raw tables: raw_sales and raw_customers. Write the SQL code for both models.

dbt

-- Create stg_sales.sql
-- Your SQL code here

-- Create stg_customers.sql
-- Your SQL code here

Need a hint?

Use simple select * from statements for each model.

Create dependent model with ref()

Create a dbt model named fct_orders.sql that depends on stg_sales and stg_customers. Use ref('stg_sales') and ref('stg_customers') to join these two models on customer_id. Select all columns from stg_sales and the customer_name from stg_customers.

dbt

-- fct_orders.sql
-- Use ref() to join stg_sales and stg_customers on customer_id
-- Your SQL code here

Need a hint?

Use {{ ref('model_name') }} to refer to other models inside your SQL.

Configure dbt to run models in parallel

Create a dbt_project.yml file and set threads to 4 to enable parallel model runs. This allows dbt to run independent models at the same time.

dbt

# dbt_project.yml
# Set threads to 4 for parallelism
# Your YAML code here

Need a hint?

Set threads: 4 at the root level in dbt_project.yml.

Run dbt models and observe execution order

Run the command dbt run --models stg_sales stg_customers fct_orders in your terminal. Observe and print the order in which models run. Write a print statement that shows the models run in this order: stg_sales and stg_customers run first (in parallel), then fct_orders runs last.

dbt

# Print the model execution order below
# Your code here

Need a hint?

Use a simple print() statement to show the order.