Store test failures for analysis in dbt - Time & Space Complexity
When storing test failures in dbt for analysis, it's important to understand how the time to process grows as the data grows.
We want to know how the time to save and analyze failures changes as the number of tests or failures increases.
Analyze the time complexity of the following dbt snippet that stores test failures.
with failed_tests as (
select *
from {{ ref('my_model') }}
where test_result = 'fail'
),
store_failures as (
select *, current_timestamp as failure_time
from failed_tests
)
select * from store_failures
This code selects all failed test rows from a model and prepares them with a timestamp for storage or further analysis.
Look for repeated actions that affect time.
- Primary operation: Scanning all rows in the model to find failures.
- How many times: Once per run, but it processes every row in the model.
The time to find failures grows as the number of rows in the model grows.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 rows scanned |
| 100 | 100 rows scanned |
| 1000 | 1000 rows scanned |
Pattern observation: The work grows directly with the number of rows; doubling rows doubles the work.
Time Complexity: O(n)
This means the time to store test failures grows linearly with the number of rows checked.
[X] Wrong: "Storing failures only takes constant time regardless of data size."
[OK] Correct: Because the code scans every row to find failures, more rows mean more work and more time.
Understanding how data size affects processing time helps you explain and improve data workflows clearly and confidently.
"What if we indexed the test_result column? How would that change the time complexity?"