dbtdata~15 mins

Building a DAG of models in dbt - Deep Dive

Choose your learning style10 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Overview - Building a DAG of models

What is it?

Building a DAG of models means creating a clear map of how different data models depend on each other. In dbt, a DAG (Directed Acyclic Graph) shows the order in which models run based on their dependencies. Each model is like a step in a recipe, and the DAG ensures the steps happen in the right order. This helps organize complex data transformations in a simple, visual way.

Why it matters

Without a DAG, data models might run in the wrong order, causing errors or wrong results. The DAG makes sure each model waits for the models it depends on to finish first. This saves time, avoids mistakes, and helps teams understand how data flows through their system. It also makes debugging and updating models easier because you can see the full chain of dependencies.

Where it fits

Before learning about building a DAG, you should understand basic SQL and how dbt models work. After mastering DAGs, you can learn about advanced dbt features like snapshots, tests, and incremental models. This topic fits in the middle of the dbt learning path, connecting model creation with project organization and execution.

Mental Model

Core Idea

A DAG of models is a map that shows which data models depend on others, ensuring they run in the correct order without loops.

Think of it like...

Imagine building a LEGO castle where some pieces must be placed before others. The DAG is like the instruction booklet that tells you the order to add each piece so the castle stands strong.

Models DAG Structure:

  Model A
    ↓
  Model B
    ↓
  Model C

Each arrow shows that the model below depends on the one above. No arrows loop back up.

Build-Up - 7 Steps

FoundationUnderstanding dbt Models Basics

Concept: Learn what a dbt model is and how it represents a SQL query that creates a table or view.

In dbt, a model is a SQL file that defines a transformation. When you run dbt, it turns these SQL files into tables or views in your database. Each model can be simple or complex, but it always produces a dataset you can use later.

Result

You can create a simple model that runs a SQL query and produces a table in your database.

Knowing what a model is helps you see how dbt organizes data transformations as building blocks.

FoundationIntroducing Model Dependencies

IntermediateWhat is a DAG in dbt?

IntermediateVisualizing the DAG in dbt

IntermediateHandling Complex DAGs with Multiple Dependencies

AdvancedAvoiding and Fixing Cycles in the DAG

ExpertOptimizing DAG Execution with Resource Management

Under the Hood

dbt parses all model SQL files and extracts references using the {{ ref() }} function. It builds a graph where nodes are models and edges are dependencies. The graph is checked for cycles to ensure it is acyclic. During execution, dbt uses topological sorting on the DAG to determine the order of model runs. Models without dependencies run first, then models depending on them, and so on. Parallel execution happens for models at the same level without dependencies between them.

Why designed this way?

The DAG design ensures data transformations happen in a logical order, preventing errors from missing data. Directed edges clarify dependency direction, and acyclic structure avoids infinite loops. This approach is common in workflow management because it is simple, reliable, and scalable. Alternatives like cyclic graphs would cause execution deadlocks or require complex handling, which dbt avoids for simplicity and robustness.

DAG Construction Flow:

┌─────────────┐
│ Parse Models│
└──────┬──────┘
       │ Extract refs
       ▼
┌─────────────┐
│ Build Graph │
│ (Nodes &    │
│  Edges)     │
└──────┬──────┘
       │ Check for cycles
       ▼
┌─────────────┐
│ Topological │
│ Sort Models │
└──────┬──────┘
       │ Run models in order
       ▼
┌─────────────┐
│ Execute DAG │
└─────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Do you think a model can depend on itself in a DAG? Commit yes or no.

Common Belief:A model can reference itself to update or append data.

Tap to reveal reality

Quick: Do you think dbt runs models in the order they appear in files? Commit yes or no.

Common Belief:dbt runs models in the order they are written or saved in the project folder.

Tap to reveal reality

Quick: Do you think adding more dependencies always slows down dbt runs? Commit yes or no.

Common Belief:More dependencies always make dbt runs slower because models wait longer.

Tap to reveal reality

Quick: Do you think the DAG only matters for large projects? Commit yes or no.

Common Belief:Small projects don’t need a DAG because they have few models.

Tap to reveal reality

Expert Zone

The DAG is not just about order but also about understanding data lineage, which helps in impact analysis and debugging.

dbt’s DAG can be extended with tags and selectors to run subsets of models, enabling flexible workflows in production.

Parallel execution depends on your database’s ability to handle concurrent queries; optimizing DAG structure alone is not enough.

When NOT to use

If your data transformations require cyclic dependencies or iterative processing, a DAG is not suitable. Instead, use tools designed for iterative workflows like Apache Airflow with loops or specialized graph processing frameworks.

Production Patterns

In production, teams use the DAG to schedule runs with orchestration tools, monitor model health via lineage, and isolate failures by rerunning only affected downstream models. They also modularize DAGs by business domains for easier maintenance.

Connections

Workflow Orchestration

Building a DAG of models is a specific example of workflow orchestration where tasks depend on each other.

Understanding DAGs in dbt helps grasp how tools like Airflow or Prefect schedule and manage complex workflows.

Project Management Dependencies

The DAG concept parallels task dependencies in project management charts like Gantt charts.

Knowing DAGs clarifies how to plan and sequence tasks in any project to avoid bottlenecks and delays.

Biological Pathways

DAGs resemble biological pathways where reactions depend on previous steps without cycles.

Seeing DAGs as natural processes helps appreciate their role in ensuring orderly progression in complex systems.

Common Pitfalls

#1Creating circular dependencies between models.

Wrong approach:Model A SQL: select * from {{ ref('model_b') }}; Model B SQL: select * from {{ ref('model_a') }};

Correct approach:Refactor models so one does not depend on the other directly or indirectly, for example: Model A SQL: select * from source_table; Model B SQL: select * from {{ ref('model_a') }};

Root cause:Misunderstanding that references create dependencies that must not form loops.

#2Assuming models run in file order, causing unexpected results.

Wrong approach:Writing models without references and expecting them to run in a specific sequence based on file names.

Correct approach:Use {{ ref() }} to explicitly declare dependencies so dbt knows the correct order.

Root cause:Not realizing dbt relies on the DAG, not file order, to schedule model runs.

#3Overloading a single model with too many dependencies, making the DAG complex and slow.

Wrong approach:One model referencing many others unnecessarily, e.g., Model C SQL: select * from {{ ref('model_a') }}, {{ ref('model_b') }}, {{ ref('model_d') }}, ...

Correct approach:Break complex models into smaller, focused models with clear dependencies.

Root cause:Lack of modular design and misunderstanding of how to manage complexity in DAGs.

Key Takeaways

A DAG of models in dbt organizes data transformations by showing dependencies and execution order.

Using {{ ref() }} creates clear dependencies that dbt uses to build the DAG and run models correctly.

The DAG must be acyclic to avoid execution errors caused by circular dependencies.

Visualizing the DAG helps understand complex projects and speeds up debugging and development.

Expert use of DAGs includes optimizing parallel execution and managing large projects with modular design.

Practice

(1/5)

What does a DAG represent in dbt?

easy

A. The configuration settings for dbt profiles

B. The syntax rules for writing SQL queries

C. The order in which models depend on each other

D. The list of all tables in the database

Which of the following is the correct way to reference another model in a dbt SQL file?

SELECT * FROM ___

easy

A. ref(model_name)

B. ref('model_name')

C. 'ref(model_name)'

D. ref:"model_name"

Given these two models, what is the order dbt will run them?

-- model_a.sql
SELECT * FROM source_table

-- model_b.sql
SELECT * FROM {{ ref('model_a') }}

medium

A. model_a runs first, then model_b

B. model_b runs first, then model_a

C. Both run simultaneously

D. dbt will error due to circular dependency

What is wrong with this dbt model code snippet?

SELECT * FROM {{ ref(model_a) }}

medium

A. Model name should be uppercase

B. ref() cannot be used inside SELECT

C. Missing FROM keyword

D. Missing quotes around model name in ref()

Building a DAG of models in dbt - Deep Dive

Start learning this pattern below

Practice

Solution

Step 1: Understand what DAG means in dbt context

Step 2: Identify the role of DAG in dbt

Final Answer:

Quick Check:

Solution

Step 1: Recall the syntax for referencing models in dbt

Step 2: Check each option for correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Identify dependencies from ref()

Step 2: Determine run order based on dependencies

Final Answer:

Quick Check:

Solution

Step 1: Check syntax of ref() usage

Step 2: Identify the error in the code snippet

Final Answer:

Quick Check:

Solution

Step 1: Analyze dependencies among models

Step 2: Determine run order respecting dependencies

Final Answer:

Quick Check: