Built-in tests (unique, not_null, accepted_values, relationships) in dbt - Time & Space Complexity
We want to understand how the time to run dbt's built-in tests changes as the data grows.
How does the test execution time grow when the input data size increases?
Analyze the time complexity of these dbt built-in tests.
-- Unique test example
select id from {{ ref('my_table') }} group by id having count(*) > 1
-- Not null test example
select id from {{ ref('my_table') }} where id is null
-- Accepted values test example
select id from {{ ref('my_table') }} where status not in ('active', 'inactive')
-- Relationships test example
select child.id from {{ ref('child_table') }} child
left join {{ ref('parent_table') }} parent on child.parent_id = parent.id
where parent.id is null
These tests check for duplicates, missing values, invalid values, and broken links between tables.
Look at what repeats as data grows.
- Primary operation: Scanning all rows in the table(s).
- How many times: Once per test, each row is checked or grouped.
As the number of rows grows, the work grows roughly the same amount.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 row checks |
| 100 | About 100 row checks |
| 1000 | About 1000 row checks |
Pattern observation: The time grows linearly with the number of rows.
Time Complexity: O(n)
This means the time to run these tests grows directly in proportion to the number of rows checked.
[X] Wrong: "These tests run instantly no matter how big the data is."
[OK] Correct: Each test must look at every row or group, so more data means more work and more time.
Understanding how test time grows helps you explain performance in real projects and shows you think about data scale practically.
"What if the accepted_values test checked against a list that grows with the data size? How would the time complexity change?"