Creating your first model in dbt - Performance & Efficiency
When creating your first model in dbt, it is helpful to understand how the time to run your model grows as your data grows.
We want to know how the work dbt does changes when the input data size changes.
Analyze the time complexity of the following dbt model SQL code.
select
customer_id,
count(order_id) as total_orders
from {{ ref('raw_orders') }}
group by customer_id
This code creates a model that counts orders per customer from a raw orders table.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning all rows in the raw_orders table.
- How many times: Once over all rows to group and count orders per customer.
As the number of orders grows, the time to scan and group them grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 rows scanned and grouped |
| 100 | About 100 rows scanned and grouped |
| 1000 | About 1000 rows scanned and grouped |
Pattern observation: The work grows steadily as the number of rows grows.
Time Complexity: O(n)
This means the time to run the model grows roughly in direct proportion to the number of rows in the input table.
[X] Wrong: "Grouping data is instant and does not depend on input size."
[OK] Correct: Grouping requires looking at each row to count, so it takes more time as data grows.
Understanding how data size affects model run time helps you write efficient dbt models and explain your reasoning clearly.
"What if we added a filter to only include recent orders? How would that change the time complexity?"