Building a DAG of models in dbt - Time & Space Complexity
When building a DAG of models in dbt, we want to understand how the time to run all models grows as we add more models.
We ask: How does the total work increase when the number of models grows?
Analyze the time complexity of the following dbt model dependencies.
-- model_a.sql
select * from source_table
-- model_b.sql
select * from {{ ref('model_a') }}
-- model_c.sql
select * from {{ ref('model_b') }}
-- model_d.sql
select * from {{ ref('model_a') }}
-- model_e.sql
select * from {{ ref('model_c') }} join {{ ref('model_d') }} on ...
This code shows models depending on others, forming a Directed Acyclic Graph (DAG) of dependencies.
Look at how dbt runs models based on dependencies.
- Primary operation: Running each model once after its dependencies.
- How many times: Each model runs exactly one time.
As you add more models, the total work grows roughly by the number of models.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 models | 10 runs |
| 100 models | 100 runs |
| 1000 models | 1000 runs |
Pattern observation: The total work grows linearly as you add more models.
Time Complexity: O(n)
This means the total time to build all models grows directly with the number of models.
[X] Wrong: "Running one model means running all its dependencies multiple times."
[OK] Correct: dbt runs each model once and reuses results, so dependencies are not rerun repeatedly.
Understanding how work grows with model count helps you design efficient data pipelines and explain your approach clearly.
"What if some models depend on many others and run slower? How would that affect the overall time complexity?"