Source freshness checks in dbt - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
We want to understand how the time it takes to check source freshness grows as the amount of data increases.
How does dbt handle checking many sources and their freshness efficiently?
Analyze the time complexity of the following dbt source freshness check snippet.
sources:
- name: my_source
freshness:
warn_after:
count: 12
period: hour
error_after:
count: 24
period: hour
# dbt runs freshness checks for each source table
This snippet defines freshness rules for a source. dbt will check each source table's last update time against these rules.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Checking the freshness timestamp for each source table.
- How many times: Once per source table configured in dbt.
As the number of source tables grows, dbt checks each one individually.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 freshness checks |
| 100 | 100 freshness checks |
| 1000 | 1000 freshness checks |
Pattern observation: The number of freshness checks grows directly with the number of source tables.
Time Complexity: O(n)
This means the time to check freshness grows linearly with the number of source tables.
[X] Wrong: "Checking freshness is constant time no matter how many sources there are."
[OK] Correct: Each source table requires its own check, so more sources mean more checks and more time.
Understanding how operations scale with input size helps you explain efficiency clearly and confidently in real projects.
"What if dbt cached freshness results and only checked sources updated recently? How would that affect time complexity?"
Practice
Solution
Step 1: Understand the role of freshness checks
Freshness checks monitor the age of data in source tables to ensure it is up-to-date.Step 2: Compare options to the purpose
Only To track how recent the data in your source tables is describes tracking data recency, which matches the purpose of freshness checks.Final Answer:
To track how recent the data in your source tables is -> Option AQuick Check:
Freshness checks = track data recency [OK]
- Confusing freshness checks with table creation
- Thinking freshness checks optimize queries
- Assuming freshness checks schedule runs
Solution
Step 1: Recall correct YAML syntax for freshness
dbt expects warn_after and error_after as objects with count and period keys.Step 2: Match options to syntax
freshness: warn_after: {count: 1, period: day} error_after: {count: 2, period: day} correctly uses {count: X, period: day} format; others use incorrect formats or swap thresholds.Final Answer:
freshness: warn_after: {count: 1, period: day} error_after: {count: 2, period: day} -> Option BQuick Check:
Use count and period keys in YAML freshness [OK]
- Using strings instead of objects for thresholds
- Swapping warn_after and error_after values
- Missing count or period keys
{"status": "", "max_loaded_at": "2024-04-20T00:00:00Z"}Solution
Step 1: Calculate data age from last loaded timestamp
If today is 2024-04-23, data is 3 days old (2024-04-23 - 2024-04-20).Step 2: Compare data age to thresholds
3 days > error_after (2 days), so status is error.Final Answer:
error -> Option AQuick Check:
Data age > error_after = error status [OK]
- Confusing warn_after and error_after thresholds
- Assuming status is warn for data older than error_after
- Ignoring current date when calculating age
sources:
- name: my_source
freshness:
warn_after: {count: 1, period: day}
error_after: {count: 2, period: days}
What is the likely cause of the error?Solution
Step 1: Check period values in freshness YAML
dbt expects period values as singular strings like 'day', not plural 'days'.Step 2: Identify error cause
Using 'days' causes a validation error; changing to 'day' fixes it.Final Answer:
The period value 'days' should be singular 'day' -> Option DQuick Check:
Period values must be singular like 'day' [OK]
- Using plural period names
- Swapping warn_after and error_after
- Adding unnecessary quotes around numbers
Solution
Step 1: Identify correct period and count values
Period should be singular 'hour', counts are numbers without quotes.Step 2: Check warn_after and error_after order
warn_after must be less than error_after; 2 < 4 is correct.Step 3: Validate options
freshness: warn_after: {count: 2, period: hour} error_after: {count: 4, period: hour} matches correct syntax and logic; A uses strings for counts, B uses plural 'hours', D swaps thresholds.Final Answer:
freshness: warn_after: {count: 2, period: hour} error_after: {count: 4, period: hour} -> Option CQuick Check:
Use singular period and correct threshold order [OK]
- Using plural period names like 'hours'
- Putting counts as strings instead of numbers
- Swapping warn_after and error_after values
