Generic tests with parameters in dbt - Time & Space Complexity
We want to understand how the time needed to run generic tests with parameters in dbt changes as the data grows.
How does the test execution time grow when we check more rows or add more parameters?
Analyze the time complexity of the following dbt generic test with parameters.
version: 2
models:
- name: customers
columns:
- name: email
tests:
- unique:
where: "status = 'active' AND email IS NOT NULL"
This test checks that active customers have unique, non-null emails using parameters to filter rows.
Look at what repeats when this test runs.
- Primary operation: Scanning rows in the customers table that match the filter and where conditions.
- How many times: Once per test execution, but the number of rows scanned depends on data size and filter selectivity.
The test scans filtered rows to check uniqueness. As the number of filtered rows grows, the work grows too.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 row checks |
| 100 | About 100 row checks |
| 1000 | About 1000 row checks |
Pattern observation: The operations grow roughly in direct proportion to the number of rows matching the filters.
Time Complexity: O(n)
This means the test time grows linearly with the number of rows it needs to check.
[X] Wrong: "Adding parameters makes the test run faster or slower in a fixed way regardless of data size."
[OK] Correct: The parameters only filter which rows are checked, so the time depends on how many rows match, not just on having parameters.
Understanding how test time grows with data size and filters helps you write efficient data checks and explain performance in real projects.
"What if we added multiple parameters that filter the data more strictly? How would the time complexity change?"