Bird
Raised Fist0
dbtdata~10 mins

Source freshness checks in dbt - Step-by-Step Execution

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Concept Flow - Source freshness checks
Define source in dbt project
Configure freshness criteria
Run dbt source freshness command
dbt queries source metadata
Compare source timestamps to criteria
Report freshness status
Take action if stale or warn
END
This flow shows how dbt checks source data freshness by defining sources, setting freshness rules, running checks, and reporting results.
Execution Sample
dbt
sources:
  - name: my_source
    tables:
      - name: users
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}
This YAML config defines a source table with freshness thresholds for warnings and errors.
Execution Table
StepActionSource TimestampCurrent TimeAge (hours)Check ResultMessage
1Start freshness check-2024-06-01 12:00-PendingStarting check for source 'users'
2Query source metadata2024-06-01 01:302024-06-01 12:0010.5Within thresholdSource data is fresh
3Compare to warn_after (12h)10.5 < 12N/ATrueNo warningNo freshness warning needed
4Compare to error_after (24h)10.5 < 24N/ATrueNo errorNo freshness error needed
5Finish check---SuccessSource freshness check passed
💡 Source timestamp age 10.5 hours is less than warn_after 12 hours, so freshness is good.
Variable Tracker
VariableStartAfter Step 2After Step 3After Step 4Final
source_timestamp-2024-06-01 01:302024-06-01 01:302024-06-01 01:302024-06-01 01:30
current_time2024-06-01 12:002024-06-01 12:002024-06-01 12:002024-06-01 12:002024-06-01 12:00
age_hours-10.510.510.510.5
check_resultPendingWithin thresholdNo warningNo errorSuccess
Key Moments - 2 Insights
Why does dbt compare the source timestamp age to both warn_after and error_after?
dbt uses warn_after to trigger a warning if data is getting old, and error_after to fail the check if data is too stale. This two-level check helps catch issues early (see execution_table rows 3 and 4).
What happens if the source timestamp is missing or null?
If the timestamp is missing, dbt cannot calculate freshness age, so the check usually fails or warns. This is important because freshness depends on having a valid timestamp (not shown in this trace but common in practice).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the age in hours of the source data at step 2?
A12
B24
C10.5
D1.5
💡 Hint
Check the 'Age (hours)' column at step 2 in the execution_table.
At which step does dbt decide that no freshness warning is needed?
AStep 3
BStep 2
CStep 4
DStep 5
💡 Hint
Look for the 'Check Result' and 'Message' columns mentioning warnings in the execution_table.
If the source timestamp was 13 hours old instead of 10.5, what would change in the execution table?
AStep 5 would say freshness check failed
BStep 3 would show a warning triggered
CStep 4 would show an error triggered
DNo changes, still passes
💡 Hint
Compare age_hours to warn_after threshold in execution_table rows 3 and 4.
Concept Snapshot
Source freshness checks in dbt:
- Define sources and tables in YAML
- Set freshness thresholds: warn_after and error_after
- Run 'dbt source freshness' to check timestamps
- Compare source data age to thresholds
- Report status: fresh, warning, or error
- Helps monitor data timeliness automatically
Full Transcript
Source freshness checks in dbt help ensure your data is up-to-date. You define your source tables and set freshness rules like warn_after and error_after in your dbt project YAML. When you run the freshness check, dbt queries the source metadata to get the latest timestamp. It calculates how old the data is by comparing the source timestamp to the current time. Then it compares this age to your thresholds. If the data is younger than warn_after, it passes with no warnings. If it is older than warn_after but younger than error_after, dbt issues a warning. If it is older than error_after, it triggers an error. This process helps catch stale data early and keeps your analytics reliable.

Practice

(1/5)
1. What is the main purpose of source freshness checks in dbt?
easy
A. To track how recent the data in your source tables is
B. To create new tables from raw data
C. To optimize SQL query performance
D. To schedule dbt runs automatically

Solution

  1. Step 1: Understand the role of freshness checks

    Freshness checks monitor the age of data in source tables to ensure it is up-to-date.
  2. Step 2: Compare options to the purpose

    Only To track how recent the data in your source tables is describes tracking data recency, which matches the purpose of freshness checks.
  3. Final Answer:

    To track how recent the data in your source tables is -> Option A
  4. Quick Check:

    Freshness checks = track data recency [OK]
Hint: Freshness checks measure data age, not table creation or scheduling [OK]
Common Mistakes:
  • Confusing freshness checks with table creation
  • Thinking freshness checks optimize queries
  • Assuming freshness checks schedule runs
2. Which of the following is the correct way to set a freshness check with a warning threshold of 1 day and an error threshold of 2 days in dbt YAML?
easy
A. freshness: warn_after: 1 day error_after: 2 day
B. freshness: warn_after: {count: 1, period: day} error_after: {count: 2, period: day}
C. freshness: warn_after: '1 day' error_after: '2 days'
D. freshness: warn_after: {count: 2, period: day} error_after: {count: 1, period: day}

Solution

  1. Step 1: Recall correct YAML syntax for freshness

    dbt expects warn_after and error_after as objects with count and period keys.
  2. Step 2: Match options to syntax

    freshness: warn_after: {count: 1, period: day} error_after: {count: 2, period: day} correctly uses {count: X, period: day} format; others use incorrect formats or swap thresholds.
  3. Final Answer:

    freshness: warn_after: {count: 1, period: day} error_after: {count: 2, period: day} -> Option B
  4. Quick Check:

    Use count and period keys in YAML freshness [OK]
Hint: Use {count: X, period: day} format for freshness thresholds [OK]
Common Mistakes:
  • Using strings instead of objects for thresholds
  • Swapping warn_after and error_after values
  • Missing count or period keys
3. Given this freshness check result output, what is the status if the last loaded timestamp is 3 days ago, warn_after is 1 day, and error_after is 2 days?
{"status": "", "max_loaded_at": "2024-04-20T00:00:00Z"}
medium
A. error
B. warn
C. pass
D. unknown

Solution

  1. Step 1: Calculate data age from last loaded timestamp

    If today is 2024-04-23, data is 3 days old (2024-04-23 - 2024-04-20).
  2. Step 2: Compare data age to thresholds

    3 days > error_after (2 days), so status is error.
  3. Final Answer:

    error -> Option A
  4. Quick Check:

    Data age > error_after = error status [OK]
Hint: If data age > error_after, status is error [OK]
Common Mistakes:
  • Confusing warn_after and error_after thresholds
  • Assuming status is warn for data older than error_after
  • Ignoring current date when calculating age
4. You wrote this freshness check YAML but it fails to run:
sources:
  - name: my_source
    freshness:
      warn_after: {count: 1, period: day}
      error_after: {count: 2, period: days}
What is the likely cause of the error?
medium
A. The count values must be strings, not numbers
B. Missing quotes around the period values
C. warn_after and error_after keys are swapped
D. The period value 'days' should be singular 'day'

Solution

  1. Step 1: Check period values in freshness YAML

    dbt expects period values as singular strings like 'day', not plural 'days'.
  2. Step 2: Identify error cause

    Using 'days' causes a validation error; changing to 'day' fixes it.
  3. Final Answer:

    The period value 'days' should be singular 'day' -> Option D
  4. Quick Check:

    Period values must be singular like 'day' [OK]
Hint: Use singular period names like 'day', not 'days' [OK]
Common Mistakes:
  • Using plural period names
  • Swapping warn_after and error_after
  • Adding unnecessary quotes around numbers
5. You want to set up a freshness check for a source table that updates hourly. You want to warn if data is older than 2 hours and error if older than 4 hours. Which YAML snippet correctly sets this up?
hard
A. freshness: warn_after: {count: '2', period: hour} error_after: {count: '4', period: hour}
B. freshness: warn_after: {count: 2, period: hours} error_after: {count: 4, period: hours}
C. freshness: warn_after: {count: 2, period: hour} error_after: {count: 4, period: hour}
D. freshness: warn_after: {count: 4, period: hour} error_after: {count: 2, period: hour}

Solution

  1. Step 1: Identify correct period and count values

    Period should be singular 'hour', counts are numbers without quotes.
  2. Step 2: Check warn_after and error_after order

    warn_after must be less than error_after; 2 < 4 is correct.
  3. Step 3: Validate options

    freshness: warn_after: {count: 2, period: hour} error_after: {count: 4, period: hour} matches correct syntax and logic; A uses strings for counts, B uses plural 'hours', D swaps thresholds.
  4. Final Answer:

    freshness: warn_after: {count: 2, period: hour} error_after: {count: 4, period: hour} -> Option C
  5. Quick Check:

    Use singular period and correct threshold order [OK]
Hint: Use singular period and warn_after < error_after [OK]
Common Mistakes:
  • Using plural period names like 'hours'
  • Putting counts as strings instead of numbers
  • Swapping warn_after and error_after values