One model per source table rule in dbt - Time & Space Complexity
When using dbt, we often create one model for each source table. This helps keep things clear and organized.
We want to understand how the time to run these models grows as the number of source tables increases.
Analyze the time complexity of this dbt project structure.
-- models/customer.sql
select * from {{ source('sales', 'customer') }}
-- models/orders.sql
select * from {{ source('sales', 'orders') }}
-- models/products.sql
select * from {{ source('inventory', 'products') }}
-- models/payments.sql
select * from {{ source('finance', 'payments') }}
This project has one model for each source table, each simply selecting from its source.
Each model runs a query that reads one source table.
- Primary operation: Running a query on a source table.
- How many times: Once per source table (one model per source).
As the number of source tables grows, the number of models grows the same way.
| Number of Source Tables (n) | Number of Models Run |
|---|---|
| 10 | 10 |
| 100 | 100 |
| 1000 | 1000 |
Pattern observation: The total work grows directly with the number of source tables.
Time Complexity: O(n)
This means the total time to run all models grows linearly with the number of source tables.
[X] Wrong: "Adding more source tables won't affect total run time much because each model runs independently."
[OK] Correct: Even if models run separately, the total time adds up because each model processes its source table once.
Understanding how your dbt project scales with more source tables shows you can plan for growth and keep your data pipeline efficient.
"What if we combined multiple source tables into fewer models? How would that change the time complexity?"