Bird
Raised Fist0
dbtdata~5 mins

Source freshness checks in dbt - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is a source freshness check in dbt?
A source freshness check in dbt is a way to monitor how up-to-date your source data is by checking the age of the newest data record against defined thresholds.
Click to reveal answer
beginner
Which configuration key in dbt defines the freshness thresholds for a source?
The key freshness defines thresholds like warn_after and error_after to set limits on acceptable data age.
Click to reveal answer
intermediate
What happens if the source data is older than the error_after threshold in a freshness check?
dbt will mark the freshness check as failed and raise an error, signaling that the source data is too old and may need attention.
Click to reveal answer
intermediate
How do you define a freshness check for a source table in the sources.yml file?
You add a freshness block under the source with warn_after and error_after times, and specify the column to check for freshness.
Click to reveal answer
beginner
Why are source freshness checks important in data pipelines?
They help ensure that data is updated on time, so downstream analysis and reports use fresh and reliable data, preventing decisions based on stale information.
Click to reveal answer
What does the warn_after threshold in a freshness check do?
AAutomatically refreshes the source data
BStops the dbt run immediately
CDeletes old data from the source
DTriggers a warning if data is older than this time
Where do you define source freshness checks in dbt?
AIn the <code>profiles.yml</code> file
BIn the <code>sources.yml</code> file
CIn the model SQL files
DIn the <code>dbt_project.yml</code> file
What column type is typically used for freshness checks?
AInteger column
BText column
CTimestamp or date column
DBoolean column
If a freshness check fails with an error, what should you do?
AInvestigate why the source data is stale and fix the data pipeline
BIgnore the error and continue
CDelete the source table
DChange the <code>warn_after</code> threshold to a higher value
Which dbt command runs source freshness checks?
A<code>dbt source freshness</code>
B<code>dbt run</code>
C<code>dbt test</code>
D<code>dbt compile</code>
Explain how to set up a source freshness check in dbt and why it is useful.
Think about configuration and the purpose of freshness checks.
You got /5 concepts.
    Describe what happens when source data exceeds the error_after threshold in a freshness check.
    Focus on the consequences of stale data detection.
    You got /4 concepts.

      Practice

      (1/5)
      1. What is the main purpose of source freshness checks in dbt?
      easy
      A. To track how recent the data in your source tables is
      B. To create new tables from raw data
      C. To optimize SQL query performance
      D. To schedule dbt runs automatically

      Solution

      1. Step 1: Understand the role of freshness checks

        Freshness checks monitor the age of data in source tables to ensure it is up-to-date.
      2. Step 2: Compare options to the purpose

        Only To track how recent the data in your source tables is describes tracking data recency, which matches the purpose of freshness checks.
      3. Final Answer:

        To track how recent the data in your source tables is -> Option A
      4. Quick Check:

        Freshness checks = track data recency [OK]
      Hint: Freshness checks measure data age, not table creation or scheduling [OK]
      Common Mistakes:
      • Confusing freshness checks with table creation
      • Thinking freshness checks optimize queries
      • Assuming freshness checks schedule runs
      2. Which of the following is the correct way to set a freshness check with a warning threshold of 1 day and an error threshold of 2 days in dbt YAML?
      easy
      A. freshness: warn_after: 1 day error_after: 2 day
      B. freshness: warn_after: {count: 1, period: day} error_after: {count: 2, period: day}
      C. freshness: warn_after: '1 day' error_after: '2 days'
      D. freshness: warn_after: {count: 2, period: day} error_after: {count: 1, period: day}

      Solution

      1. Step 1: Recall correct YAML syntax for freshness

        dbt expects warn_after and error_after as objects with count and period keys.
      2. Step 2: Match options to syntax

        freshness: warn_after: {count: 1, period: day} error_after: {count: 2, period: day} correctly uses {count: X, period: day} format; others use incorrect formats or swap thresholds.
      3. Final Answer:

        freshness: warn_after: {count: 1, period: day} error_after: {count: 2, period: day} -> Option B
      4. Quick Check:

        Use count and period keys in YAML freshness [OK]
      Hint: Use {count: X, period: day} format for freshness thresholds [OK]
      Common Mistakes:
      • Using strings instead of objects for thresholds
      • Swapping warn_after and error_after values
      • Missing count or period keys
      3. Given this freshness check result output, what is the status if the last loaded timestamp is 3 days ago, warn_after is 1 day, and error_after is 2 days?
      {"status": "", "max_loaded_at": "2024-04-20T00:00:00Z"}
      medium
      A. error
      B. warn
      C. pass
      D. unknown

      Solution

      1. Step 1: Calculate data age from last loaded timestamp

        If today is 2024-04-23, data is 3 days old (2024-04-23 - 2024-04-20).
      2. Step 2: Compare data age to thresholds

        3 days > error_after (2 days), so status is error.
      3. Final Answer:

        error -> Option A
      4. Quick Check:

        Data age > error_after = error status [OK]
      Hint: If data age > error_after, status is error [OK]
      Common Mistakes:
      • Confusing warn_after and error_after thresholds
      • Assuming status is warn for data older than error_after
      • Ignoring current date when calculating age
      4. You wrote this freshness check YAML but it fails to run:
      sources:
        - name: my_source
          freshness:
            warn_after: {count: 1, period: day}
            error_after: {count: 2, period: days}
      What is the likely cause of the error?
      medium
      A. The count values must be strings, not numbers
      B. Missing quotes around the period values
      C. warn_after and error_after keys are swapped
      D. The period value 'days' should be singular 'day'

      Solution

      1. Step 1: Check period values in freshness YAML

        dbt expects period values as singular strings like 'day', not plural 'days'.
      2. Step 2: Identify error cause

        Using 'days' causes a validation error; changing to 'day' fixes it.
      3. Final Answer:

        The period value 'days' should be singular 'day' -> Option D
      4. Quick Check:

        Period values must be singular like 'day' [OK]
      Hint: Use singular period names like 'day', not 'days' [OK]
      Common Mistakes:
      • Using plural period names
      • Swapping warn_after and error_after
      • Adding unnecessary quotes around numbers
      5. You want to set up a freshness check for a source table that updates hourly. You want to warn if data is older than 2 hours and error if older than 4 hours. Which YAML snippet correctly sets this up?
      hard
      A. freshness: warn_after: {count: '2', period: hour} error_after: {count: '4', period: hour}
      B. freshness: warn_after: {count: 2, period: hours} error_after: {count: 4, period: hours}
      C. freshness: warn_after: {count: 2, period: hour} error_after: {count: 4, period: hour}
      D. freshness: warn_after: {count: 4, period: hour} error_after: {count: 2, period: hour}

      Solution

      1. Step 1: Identify correct period and count values

        Period should be singular 'hour', counts are numbers without quotes.
      2. Step 2: Check warn_after and error_after order

        warn_after must be less than error_after; 2 < 4 is correct.
      3. Step 3: Validate options

        freshness: warn_after: {count: 2, period: hour} error_after: {count: 4, period: hour} matches correct syntax and logic; A uses strings for counts, B uses plural 'hours', D swaps thresholds.
      4. Final Answer:

        freshness: warn_after: {count: 2, period: hour} error_after: {count: 4, period: hour} -> Option C
      5. Quick Check:

        Use singular period and correct threshold order [OK]
      Hint: Use singular period and warn_after < error_after [OK]
      Common Mistakes:
      • Using plural period names like 'hours'
      • Putting counts as strings instead of numbers
      • Swapping warn_after and error_after values