Unit testing dbt models - Time & Space Complexity
When we run unit tests on dbt models, we want to know how the time it takes grows as our data or tests grow.
We ask: How does testing time change when we add more data or more tests?
Analyze the time complexity of the following dbt test code snippet.
-- Simple uniqueness test on a model column
select
id,
count(*) as count
from {{ ref('my_model') }}
group by id
having count > 1
This test checks if the 'id' column in the model has duplicate values.
Look for repeated work in the test query.
- Primary operation: Scanning all rows of the model to group by 'id'.
- How many times: Once over the entire dataset for grouping and counting.
As the number of rows in the model grows, the grouping and counting take more time.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 rows scanned and grouped |
| 100 | About 100 rows scanned and grouped |
| 1000 | About 1000 rows scanned and grouped |
Pattern observation: The work grows roughly in direct proportion to the number of rows.
Time Complexity: O(n)
This means the test time grows linearly as the data size increases.
[X] Wrong: "Unit tests run instantly no matter how big the data is."
[OK] Correct: The test scans all data rows, so bigger data means longer test time.
Understanding how test time grows helps you write efficient tests and explain your choices clearly.
"What if we added multiple columns to test uniqueness on? How would the time complexity change?"