Why optimization reduces warehouse costs in dbt - Performance Analysis
We want to see how optimizing dbt models affects the time they take to run.
How does reducing repeated work lower costs in data warehouses?
Analyze the time complexity of the following dbt model code.
with base as (
select * from source_table
),
filtered as (
select * from base where status = 'active'
),
aggregated as (
select user_id, count(*) as total from filtered group by user_id
)
select * from aggregated
This code filters rows and then groups them to count per user.
Look for repeated scanning or grouping over data.
- Primary operation: Scanning all rows in
source_tableand grouping byuser_id. - How many times: Each step processes the full or filtered data once.
As the number of rows grows, the time to scan and group grows too.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 scans and groups |
| 100 | About 100 scans and groups |
| 1000 | About 1000 scans and groups |
Pattern observation: The work grows roughly in direct proportion to the number of rows.
Time Complexity: O(n)
This means the time to run grows linearly as the data size grows.
[X] Wrong: "Adding more steps in dbt models does not affect run time much."
[OK] Correct: Each step processes data and adds to total time, so more steps usually mean more work and higher cost.
Understanding how data size affects processing time helps you write efficient dbt models that save time and money.
"What if we added an extra filter before grouping? How would that change the time complexity?"