Source freshness checks in dbt - Time & Space Complexity
We want to understand how the time it takes to check source freshness grows as the amount of data increases.
How does dbt handle checking many sources and their freshness efficiently?
Analyze the time complexity of the following dbt source freshness check snippet.
sources:
- name: my_source
freshness:
warn_after:
count: 12
period: hour
error_after:
count: 24
period: hour
# dbt runs freshness checks for each source table
This snippet defines freshness rules for a source. dbt will check each source table's last update time against these rules.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Checking the freshness timestamp for each source table.
- How many times: Once per source table configured in dbt.
As the number of source tables grows, dbt checks each one individually.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 freshness checks |
| 100 | 100 freshness checks |
| 1000 | 1000 freshness checks |
Pattern observation: The number of freshness checks grows directly with the number of source tables.
Time Complexity: O(n)
This means the time to check freshness grows linearly with the number of source tables.
[X] Wrong: "Checking freshness is constant time no matter how many sources there are."
[OK] Correct: Each source table requires its own check, so more sources mean more checks and more time.
Understanding how operations scale with input size helps you explain efficiency clearly and confidently in real projects.
"What if dbt cached freshness results and only checked sources updated recently? How would that affect time complexity?"