dbt-utils package tests - Time & Space Complexity
When running dbt-utils package tests, we want to know how the time to complete these tests changes as the data grows.
We ask: How does the test execution time grow when the input data size increases?
Analyze the time complexity of this dbt test using dbt-utils.
-- Example of a dbt-utils test for uniqueness
{{ dbt_utils.unique_combination(['user_id'], model=ref('users')) }}
This test checks if the column 'user_id' in the 'users' table has unique values.
Look at what repeats when the test runs.
- Primary operation: Scanning all rows in the 'users' table to check for duplicates.
- How many times: Once over all rows, comparing each 'user_id' to others.
As the number of rows grows, the test must check more data.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | Checking 10 rows for duplicates |
| 100 | Checking 100 rows for duplicates |
| 1000 | Checking 1000 rows for duplicates |
Pattern observation: The operations grow roughly in direct proportion to the number of rows.
Time Complexity: O(n)
This means the test time grows linearly as the number of rows increases.
[X] Wrong: "The test runs instantly no matter how big the table is."
[OK] Correct: The test must scan all rows to find duplicates, so more rows mean more work and longer time.
Understanding how tests scale with data size helps you write efficient data checks and shows you think about performance in real projects.
"What if we changed the test to check uniqueness on two columns instead of one? How would the time complexity change?"