0
0
dbtdata~15 mins

Building a DAG of models in dbt - Deep Dive

Choose your learning style9 modes available
Overview - Building a DAG of models
What is it?
Building a DAG of models means creating a clear map of how different data models depend on each other. In dbt, a DAG (Directed Acyclic Graph) shows the order in which models run based on their dependencies. Each model is like a step in a recipe, and the DAG ensures the steps happen in the right order. This helps organize complex data transformations in a simple, visual way.
Why it matters
Without a DAG, data models might run in the wrong order, causing errors or wrong results. The DAG makes sure each model waits for the models it depends on to finish first. This saves time, avoids mistakes, and helps teams understand how data flows through their system. It also makes debugging and updating models easier because you can see the full chain of dependencies.
Where it fits
Before learning about building a DAG, you should understand basic SQL and how dbt models work. After mastering DAGs, you can learn about advanced dbt features like snapshots, tests, and incremental models. This topic fits in the middle of the dbt learning path, connecting model creation with project organization and execution.
Mental Model
Core Idea
A DAG of models is a map that shows which data models depend on others, ensuring they run in the correct order without loops.
Think of it like...
Imagine building a LEGO castle where some pieces must be placed before others. The DAG is like the instruction booklet that tells you the order to add each piece so the castle stands strong.
Models DAG Structure:

  Model A
    ↓
  Model B
    ↓
  Model C

Each arrow shows that the model below depends on the one above. No arrows loop back up.
Build-Up - 7 Steps
1
FoundationUnderstanding dbt Models Basics
πŸ€”
Concept: Learn what a dbt model is and how it represents a SQL query that creates a table or view.
In dbt, a model is a SQL file that defines a transformation. When you run dbt, it turns these SQL files into tables or views in your database. Each model can be simple or complex, but it always produces a dataset you can use later.
Result
You can create a simple model that runs a SQL query and produces a table in your database.
Knowing what a model is helps you see how dbt organizes data transformations as building blocks.
2
FoundationIntroducing Model Dependencies
πŸ€”
Concept: Models can use other models as inputs by referencing them, creating dependencies.
In dbt, you use the {{ ref('model_name') }} function inside your SQL to refer to another model. This tells dbt that your model depends on that other model. For example, if Model B uses data from Model A, Model B depends on Model A.
Result
You create a chain where Model B waits for Model A to finish before running.
Understanding dependencies is key to controlling the order of model execution.
3
IntermediateWhat is a DAG in dbt?
πŸ€”
Concept: A DAG is a graph that shows all models and their dependencies without any cycles.
dbt builds a Directed Acyclic Graph (DAG) from your models and their references. Directed means arrows show direction of dependency. Acyclic means no loops exist, so a model cannot depend on itself directly or indirectly. This graph helps dbt know the order to run models.
Result
You get a visual and logical map of model execution order.
Knowing the DAG prevents circular dependencies that cause errors and confusion.
4
IntermediateVisualizing the DAG in dbt
πŸ€”Before reading on: Do you think dbt can show you the DAG visually or only as text? Commit to your answer.
Concept: dbt provides commands and tools to see the DAG graphically or as a list.
You can run dbt commands like 'dbt docs generate' and 'dbt docs serve' to open a web page showing your DAG visually. Each node is a model, and arrows show dependencies. This helps you understand complex projects easily.
Result
You see a clear, interactive graph of your models and their connections.
Visualizing the DAG makes it easier to spot problems and understand data flow.
5
IntermediateHandling Complex DAGs with Multiple Dependencies
πŸ€”Before reading on: Do you think a model can depend on multiple models at once? Commit to your answer.
Concept: Models can depend on many other models, creating branches and merges in the DAG.
A model can reference several other models using multiple {{ ref() }} calls. This creates a DAG with branches where one model feeds many others, and merges where one model depends on many. dbt manages this complexity automatically.
Result
Your DAG can represent complex data pipelines with many interconnected models.
Understanding multi-dependencies helps you design scalable and maintainable data workflows.
6
AdvancedAvoiding and Fixing Cycles in the DAG
πŸ€”Before reading on: Can a DAG have cycles? What happens if it does? Commit to your answer.
Concept: Cycles break the DAG because they create infinite loops in dependencies, which dbt cannot run.
If two or more models reference each other directly or indirectly, dbt will raise an error about circular dependencies. To fix this, you must redesign your models to remove cycles, often by splitting models or rethinking dependencies.
Result
Your project runs without errors and models execute in a clear order.
Knowing how to detect and fix cycles prevents frustrating runtime errors and keeps your data pipeline healthy.
7
ExpertOptimizing DAG Execution with Resource Management
πŸ€”Before reading on: Do you think dbt runs all models one by one or can it run some in parallel? Commit to your answer.
Concept: dbt uses the DAG to run independent models in parallel, optimizing execution time and resource use.
Because the DAG shows dependencies, dbt can run models that don't depend on each other at the same time. This parallelism speeds up runs. Experts design DAGs to maximize parallel execution while respecting dependencies. They also consider database resource limits and use tags or selectors to control runs.
Result
Faster dbt runs and efficient use of computing resources.
Understanding DAG execution strategies helps you build performant and scalable data pipelines.
Under the Hood
dbt parses all model SQL files and extracts references using the {{ ref() }} function. It builds a graph where nodes are models and edges are dependencies. The graph is checked for cycles to ensure it is acyclic. During execution, dbt uses topological sorting on the DAG to determine the order of model runs. Models without dependencies run first, then models depending on them, and so on. Parallel execution happens for models at the same level without dependencies between them.
Why designed this way?
The DAG design ensures data transformations happen in a logical order, preventing errors from missing data. Directed edges clarify dependency direction, and acyclic structure avoids infinite loops. This approach is common in workflow management because it is simple, reliable, and scalable. Alternatives like cyclic graphs would cause execution deadlocks or require complex handling, which dbt avoids for simplicity and robustness.
DAG Construction Flow:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Parse Modelsβ”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚ Extract refs
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Build Graph β”‚
β”‚ (Nodes &    β”‚
β”‚  Edges)     β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚ Check for cycles
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Topological β”‚
β”‚ Sort Models β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚ Run models in order
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Execute DAG β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Myth Busters - 4 Common Misconceptions
Quick: Do you think a model can depend on itself in a DAG? Commit yes or no.
Common Belief:A model can reference itself to update or append data.
Tap to reveal reality
Reality:A model cannot depend on itself because that creates a cycle, which breaks the DAG and causes errors.
Why it matters:Trying to create self-dependencies causes dbt to fail, stopping your data pipeline and wasting time debugging.
Quick: Do you think dbt runs models in the order they appear in files? Commit yes or no.
Common Belief:dbt runs models in the order they are written or saved in the project folder.
Tap to reveal reality
Reality:dbt runs models based on the DAG dependencies, not file order. Models run only after their dependencies finish.
Why it matters:Assuming file order can lead to confusion and errors when models run before their inputs are ready.
Quick: Do you think adding more dependencies always slows down dbt runs? Commit yes or no.
Common Belief:More dependencies always make dbt runs slower because models wait longer.
Tap to reveal reality
Reality:While dependencies add order, dbt can run independent models in parallel, so well-designed DAGs can run efficiently even with many dependencies.
Why it matters:Misunderstanding this can lead to unnecessary simplification or poor design of data models.
Quick: Do you think the DAG only matters for large projects? Commit yes or no.
Common Belief:Small projects don’t need a DAG because they have few models.
Tap to reveal reality
Reality:Even small projects benefit from a DAG to avoid errors and understand model relationships clearly.
Why it matters:Ignoring DAG principles early can cause scaling problems and confusion as projects grow.
Expert Zone
1
The DAG is not just about order but also about understanding data lineage, which helps in impact analysis and debugging.
2
dbt’s DAG can be extended with tags and selectors to run subsets of models, enabling flexible workflows in production.
3
Parallel execution depends on your database’s ability to handle concurrent queries; optimizing DAG structure alone is not enough.
When NOT to use
If your data transformations require cyclic dependencies or iterative processing, a DAG is not suitable. Instead, use tools designed for iterative workflows like Apache Airflow with loops or specialized graph processing frameworks.
Production Patterns
In production, teams use the DAG to schedule runs with orchestration tools, monitor model health via lineage, and isolate failures by rerunning only affected downstream models. They also modularize DAGs by business domains for easier maintenance.
Connections
Workflow Orchestration
Building a DAG of models is a specific example of workflow orchestration where tasks depend on each other.
Understanding DAGs in dbt helps grasp how tools like Airflow or Prefect schedule and manage complex workflows.
Project Management Dependencies
The DAG concept parallels task dependencies in project management charts like Gantt charts.
Knowing DAGs clarifies how to plan and sequence tasks in any project to avoid bottlenecks and delays.
Biological Pathways
DAGs resemble biological pathways where reactions depend on previous steps without cycles.
Seeing DAGs as natural processes helps appreciate their role in ensuring orderly progression in complex systems.
Common Pitfalls
#1Creating circular dependencies between models.
Wrong approach:Model A SQL: select * from {{ ref('model_b') }}; Model B SQL: select * from {{ ref('model_a') }};
Correct approach:Refactor models so one does not depend on the other directly or indirectly, for example: Model A SQL: select * from source_table; Model B SQL: select * from {{ ref('model_a') }};
Root cause:Misunderstanding that references create dependencies that must not form loops.
#2Assuming models run in file order, causing unexpected results.
Wrong approach:Writing models without references and expecting them to run in a specific sequence based on file names.
Correct approach:Use {{ ref() }} to explicitly declare dependencies so dbt knows the correct order.
Root cause:Not realizing dbt relies on the DAG, not file order, to schedule model runs.
#3Overloading a single model with too many dependencies, making the DAG complex and slow.
Wrong approach:One model referencing many others unnecessarily, e.g., Model C SQL: select * from {{ ref('model_a') }}, {{ ref('model_b') }}, {{ ref('model_d') }}, ...
Correct approach:Break complex models into smaller, focused models with clear dependencies.
Root cause:Lack of modular design and misunderstanding of how to manage complexity in DAGs.
Key Takeaways
A DAG of models in dbt organizes data transformations by showing dependencies and execution order.
Using {{ ref() }} creates clear dependencies that dbt uses to build the DAG and run models correctly.
The DAG must be acyclic to avoid execution errors caused by circular dependencies.
Visualizing the DAG helps understand complex projects and speeds up debugging and development.
Expert use of DAGs includes optimizing parallel execution and managing large projects with modular design.