Query profiling and optimization in dbt - Time & Space Complexity
When we profile and optimize queries in dbt, we want to know how the time to run a query changes as the data grows.
We ask: How does the query's work increase when the input data gets bigger?
Analyze the time complexity of the following dbt SQL model.
select
user_id,
count(*) as total_orders
from {{ ref('orders') }}
where order_date >= '2023-01-01'
group by user_id
This query counts orders per user since the start of 2023.
Look for repeated work inside the query.
- Primary operation: Scanning all rows in the orders table after filtering by date.
- How many times: Once per row that matches the date filter.
As the number of orders grows, the query must check more rows and group them.
| Input Size (n) | Approx. Operations |
|---|---|
| 10,000 orders | About 10,000 row checks and grouping steps |
| 100,000 orders | About 100,000 row checks and grouping steps |
| 1,000,000 orders | About 1,000,000 row checks and grouping steps |
Pattern observation: The work grows roughly in direct proportion to the number of rows.
Time Complexity: O(n)
This means the query time grows linearly as the number of orders increases.
[X] Wrong: "Adding a WHERE filter always makes the query run in constant time."
[OK] Correct: The filter reduces rows but the query still scans all matching rows, so time grows with filtered data size.
Understanding how query time grows helps you write efficient dbt models and explain your choices clearly in interviews.
What if we added an index on order_date? How would the time complexity change?