Bird
Raised Fist0
dbtdata~10 mins

Building a DAG of models in dbt - Step-by-Step Execution

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Concept Flow - Building a DAG of models
Define base models
Create intermediate models
Reference base models in SQL
Create final models
Reference intermediate models
dbt builds DAG
Run models in dependency order
You start by defining simple base models, then build intermediate models that use those bases, and finally create final models that depend on intermediates. dbt reads these references to build a DAG and runs models in the right order.
Execution Sample
dbt
with base as (
  select * from raw_data
),
intermediate as (
  select * from base where value > 10
),
final as (
  select count(*) as total from intermediate
)
select * from final
This SQL defines a base model selecting raw data, then an intermediate model filtering that base, and a final model aggregating the intermediate, showing how models depend on each other.
Execution Table
StepActionModel CreatedDependenciesResult
1Define base model SQLbaseNoneModel 'base' selects raw_data table
2Define intermediate model SQLintermediatebaseModel 'intermediate' filters 'base' where value > 10
3Define final model SQLfinalintermediateModel 'final' aggregates 'intermediate' results
4dbt parses modelsDAG builtbase -> intermediate -> finalDAG shows dependencies
5dbt runs models in orderbaseNonebase model runs first
6dbt runs models in orderintermediatebaseintermediate runs after base
7dbt runs models in orderfinalintermediatefinal runs last
8Execution completeAll modelsAll dependencies metAll models built successfully
💡 All models executed in dependency order, DAG run complete
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3Final
baseundefinedDefined as raw_data selectDefinedDefinedUsed by intermediate
intermediateundefinedundefinedDefined as filtered baseDefinedUsed by final
finalundefinedundefinedundefinedDefined as aggregation of intermediateBuilt last
DAGemptybase onlybase -> intermediatebase -> intermediate -> finalComplete DAG
Run Orderemptybasebase -> intermediatebase -> intermediate -> finalExecution done
Key Moments - 3 Insights
Why does dbt run the 'base' model before 'intermediate'?
Because 'intermediate' depends on 'base' as shown in the execution_table rows 5 and 6, dbt runs models respecting dependencies to ensure data is ready.
What happens if a model does not reference any other model?
It is treated as a base model with no dependencies, so dbt runs it first as shown in execution_table row 1 and 5.
How does dbt know the order to run models?
dbt builds a DAG from model references (execution_table row 4), then runs models in topological order to satisfy dependencies.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, which model runs immediately after 'base'?
Afinal
Bintermediate
Craw_data
DNone
💡 Hint
Check rows 5 and 6 in the execution_table showing run order
At which step does dbt build the DAG of models?
AStep 4
BStep 6
CStep 2
DStep 8
💡 Hint
Look at the 'Action' column in execution_table row 4
If the 'intermediate' model did not reference 'base', what would change in the DAG?
ADAG would be the same
BDAG would have extra edges
CDAG would have no edges between models
DDAG would fail to build
💡 Hint
Check variable_tracker row for DAG showing dependencies
Concept Snapshot
Building a DAG of models in dbt:
- Define base models with no dependencies
- Create intermediate models referencing base models
- Create final models referencing intermediate models
- dbt parses references to build a DAG
- dbt runs models in dependency order
- This ensures data flows correctly through models
Full Transcript
In dbt, you build a Directed Acyclic Graph (DAG) of models by writing SQL models that reference each other. You start with base models that select raw data. Then you create intermediate models that use base models in their SQL. Finally, you create final models that depend on intermediate models. dbt reads these references and builds a DAG showing dependencies. When you run dbt, it executes models in order so each model's dependencies are ready first. This process ensures your data transformations happen in the right sequence. The execution table shows each step: defining models, building the DAG, and running models in order. Variables track model definitions and DAG state. Key moments clarify why order matters and how dbt knows it. The quiz tests your understanding of model order and DAG building.

Practice

(1/5)
1.

What does a DAG represent in dbt?

easy
A. The configuration settings for dbt profiles
B. The syntax rules for writing SQL queries
C. The order in which models depend on each other
D. The list of all tables in the database

Solution

  1. Step 1: Understand what DAG means in dbt context

    A DAG (Directed Acyclic Graph) shows how models are connected by dependencies.
  2. Step 2: Identify the role of DAG in dbt

    dbt uses the DAG to know which models to run first based on dependencies.
  3. Final Answer:

    The order in which models depend on each other -> Option C
  4. Quick Check:

    DAG = model dependency order [OK]
Hint: DAG shows model dependencies and run order [OK]
Common Mistakes:
  • Confusing DAG with SQL syntax
  • Thinking DAG lists all tables
  • Mixing DAG with dbt config files
2.

Which of the following is the correct way to reference another model in a dbt SQL file?

SELECT * FROM ___
easy
A. ref(model_name)
B. ref('model_name')
C. 'ref(model_name)'
D. ref:"model_name"

Solution

  1. Step 1: Recall the syntax for referencing models in dbt

    dbt uses the function ref() with the model name as a string inside parentheses.
  2. Step 2: Check each option for correct syntax

    ref('model_name') uses ref('model_name') which is correct; others have syntax errors or wrong quotes.
  3. Final Answer:

    ref('model_name') -> Option B
  4. Quick Check:

    Use ref('model_name') with quotes [OK]
Hint: Use ref('model_name') with quotes and parentheses [OK]
Common Mistakes:
  • Omitting quotes around model name
  • Using wrong quote types
  • Using colons or other symbols
3.

Given these two models, what is the order dbt will run them?

-- model_a.sql
SELECT * FROM source_table

-- model_b.sql
SELECT * FROM {{ ref('model_a') }}
medium
A. model_a runs first, then model_b
B. model_b runs first, then model_a
C. Both run simultaneously
D. dbt will error due to circular dependency

Solution

  1. Step 1: Identify dependencies from ref()

    model_b references model_a using ref(), so model_b depends on model_a.
  2. Step 2: Determine run order based on dependencies

    dbt runs model_a first, then model_b to ensure data is ready.
  3. Final Answer:

    model_a runs first, then model_b -> Option A
  4. Quick Check:

    Dependency order = model_a before model_b [OK]
Hint: Models run in dependency order: referenced first [OK]
Common Mistakes:
  • Assuming ref() means reverse dependency
  • Thinking models run simultaneously
  • Confusing circular dependency errors
4.

What is wrong with this dbt model code snippet?

SELECT * FROM {{ ref(model_a) }}
medium
A. Model name should be uppercase
B. ref() cannot be used inside SELECT
C. Missing FROM keyword
D. Missing quotes around model name in ref()

Solution

  1. Step 1: Check syntax of ref() usage

    ref() requires the model name as a string with quotes inside the parentheses.
  2. Step 2: Identify the error in the code snippet

    model_a is not quoted, causing a syntax error in dbt compilation.
  3. Final Answer:

    Missing quotes around model name in ref() -> Option D
  4. Quick Check:

    ref('model_name') needs quotes [OK]
Hint: Always put model names in quotes inside ref() [OK]
Common Mistakes:
  • Forgetting quotes around model names
  • Thinking ref() can't be in SELECT
  • Assuming case sensitivity causes error
5.

You have three models: model_x, model_y, and model_z. model_y references model_x, and model_z references both model_x and model_y. Which of the following is the correct order dbt will run these models?

hard
A. model_x, model_y, model_z
B. model_y, model_x, model_z
C. model_z, model_y, model_x
D. model_x, model_z, model_y

Solution

  1. Step 1: Analyze dependencies among models

    model_y depends on model_x; model_z depends on both model_x and model_y.
  2. Step 2: Determine run order respecting dependencies

    model_x runs first (no dependencies), then model_y (depends on model_x), then model_z (depends on both).
  3. Final Answer:

    model_x, model_y, model_z -> Option A
  4. Quick Check:

    Run order respects dependencies [OK]
Hint: Run models so dependencies are built before dependents [OK]
Common Mistakes:
  • Running dependent models before their dependencies
  • Ignoring multiple dependencies
  • Assuming any order works if models reference each other