Bird
Raised Fist0
dbtdata~20 mins

Source freshness checks in dbt - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Source Freshness Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of freshness check query
Given the following SQL query used in a dbt source freshness check, what will be the output if the source table's max timestamp is '2024-06-01 10:00:00' and the current timestamp is '2024-06-01 12:00:00'?
dbt
SELECT
  MAX(updated_at) AS last_loaded,
  CURRENT_TIMESTAMP AS check_time,
  TIMESTAMPDIFF(MINUTE, MAX(updated_at), CURRENT_TIMESTAMP) AS minutes_behind
FROM source_schema.source_table;
ASyntaxError: TIMESTAMPDIFF function not found
B{"last_loaded": "2024-06-01 10:00:00", "check_time": "2024-06-01 12:00:00", "minutes_behind": 120}
C{"last_loaded": null, "check_time": "2024-06-01 12:00:00", "minutes_behind": null}
D{"last_loaded": "2024-06-01 12:00:00", "check_time": "2024-06-01 10:00:00", "minutes_behind": -120}
Attempts:
2 left
💡 Hint
Think about how TIMESTAMPDIFF calculates the difference between two timestamps.
🧠 Conceptual
intermediate
1:30remaining
Understanding freshness thresholds
In dbt source freshness checks, what does the 'warn_after' threshold represent?
AThe maximum allowed time difference before a warning is triggered
BThe exact time when the source data was last updated
CThe minimum time difference required to mark data as fresh
DThe time interval to refresh the source table automatically
Attempts:
2 left
💡 Hint
Think about what triggers a warning in freshness checks.
🔧 Debug
advanced
2:30remaining
Identify the error in freshness SQL
This SQL snippet is used in a dbt freshness check but raises an error. What is the cause? SELECT MAX(last_updated) AS last_loaded, CURRENT_TIMESTAMP AS check_time, DATEDIFF(MINUTES, MAX(last_updated), CURRENT_TIMESTAMP) AS minutes_behind FROM source_schema.source_table;
AMAX function cannot be used on timestamp columns
BCURRENT_TIMESTAMP is not valid SQL syntax
CDATEDIFF function does not accept 'MINUTES' as a unit
DMissing GROUP BY clause causes error
Attempts:
2 left
💡 Hint
Check the syntax and parameters of the DATEDIFF function.
data_output
advanced
2:00remaining
Result of freshness check with null timestamps
What will be the output of this freshness check query if the source table has no rows (empty)? SELECT MAX(updated_at) AS last_loaded, CURRENT_TIMESTAMP AS check_time, TIMESTAMPDIFF(MINUTE, MAX(updated_at), CURRENT_TIMESTAMP) AS minutes_behind FROM source_schema.source_table;
ARuntimeError: Cannot compute TIMESTAMPDIFF with null values
B{"last_loaded": null, "check_time": "2024-06-01 12:00:00", "minutes_behind": 0}
C{"last_loaded": "1970-01-01 00:00:00", "check_time": "2024-06-01 12:00:00", "minutes_behind": 0}
D{"last_loaded": null, "check_time": "2024-06-01 12:00:00", "minutes_behind": null}
Attempts:
2 left
💡 Hint
Consider how aggregate functions behave on empty tables.
🚀 Application
expert
3:00remaining
Designing a freshness check for multiple sources
You have three source tables with different update frequencies. You want to configure dbt freshness checks with these thresholds: - source A: warn if data is older than 30 minutes, error if older than 1 hour - source B: warn if data is older than 2 hours, error if older than 4 hours - source C: warn if data is older than 1 day, error if older than 2 days Which YAML configuration snippet correctly sets these freshness checks?
dbt
version: 2
sources:
  - name: source_a
    freshness:
      warn_after: {count: 30, period: minute}
      error_after: {count: 1, period: hour}
  - name: source_b
    freshness:
      warn_after: {count: 2, period: hour}
      error_after: {count: 4, period: hour}
  - name: source_c
    freshness:
      warn_after: {count: 1, period: day}
      error_after: {count: 2, period: day}
A
version: 2
sources:
  - name: source_a
    freshness:
      warn_after: {count: 30, period: minute}
      error_after: {count: 1, period: hour}
  - name: source_b
    freshness:
      warn_after: {count: 2, period: hour}
      error_after: {count: 4, period: hour}
  - name: source_c
    freshness:
      warn_after: {count: 1, period: day}
      error_after: {count: 2, period: day}
B
version: 2
sources:
  - name: source_a
    freshness:
      warn_after: {count: 1, period: hour}
      error_after: {count: 30, period: minute}
  - name: source_b
    freshness:
      warn_after: {count: 4, period: hour}
      error_after: {count: 2, period: hour}
  - name: source_c
    freshness:
      warn_after: {count: 2, period: day}
      error_after: {count: 1, period: day}
C
version: 2
sources:
  - name: source_a
    freshness:
      warn_after: {count: 30, period: hour}
      error_after: {count: 1, period: minute}
  - name: source_b
    freshness:
      warn_after: {count: 2, period: minute}
      error_after: {count: 4, period: minute}
  - name: source_c
    freshness:
      warn_after: {count: 1, period: hour}
      error_after: {count: 2, period: day}
D
version: 2
sources:
  - name: source_a
    freshness:
      warn_after: {count: 1, period: minute}
      error_after: {count: 30, period: hour}
  - name: source_b
    freshness:
      warn_after: {count: 2, period: day}
      error_after: {count: 4, period: day}
  - name: source_c
    freshness:
      warn_after: {count: 1, period: hour}
      error_after: {count: 2, period: hour}
Attempts:
2 left
💡 Hint
Check that warn_after is less than error_after and units match the requirements.

Practice

(1/5)
1. What is the main purpose of source freshness checks in dbt?
easy
A. To track how recent the data in your source tables is
B. To create new tables from raw data
C. To optimize SQL query performance
D. To schedule dbt runs automatically

Solution

  1. Step 1: Understand the role of freshness checks

    Freshness checks monitor the age of data in source tables to ensure it is up-to-date.
  2. Step 2: Compare options to the purpose

    Only To track how recent the data in your source tables is describes tracking data recency, which matches the purpose of freshness checks.
  3. Final Answer:

    To track how recent the data in your source tables is -> Option A
  4. Quick Check:

    Freshness checks = track data recency [OK]
Hint: Freshness checks measure data age, not table creation or scheduling [OK]
Common Mistakes:
  • Confusing freshness checks with table creation
  • Thinking freshness checks optimize queries
  • Assuming freshness checks schedule runs
2. Which of the following is the correct way to set a freshness check with a warning threshold of 1 day and an error threshold of 2 days in dbt YAML?
easy
A. freshness: warn_after: 1 day error_after: 2 day
B. freshness: warn_after: {count: 1, period: day} error_after: {count: 2, period: day}
C. freshness: warn_after: '1 day' error_after: '2 days'
D. freshness: warn_after: {count: 2, period: day} error_after: {count: 1, period: day}

Solution

  1. Step 1: Recall correct YAML syntax for freshness

    dbt expects warn_after and error_after as objects with count and period keys.
  2. Step 2: Match options to syntax

    freshness: warn_after: {count: 1, period: day} error_after: {count: 2, period: day} correctly uses {count: X, period: day} format; others use incorrect formats or swap thresholds.
  3. Final Answer:

    freshness: warn_after: {count: 1, period: day} error_after: {count: 2, period: day} -> Option B
  4. Quick Check:

    Use count and period keys in YAML freshness [OK]
Hint: Use {count: X, period: day} format for freshness thresholds [OK]
Common Mistakes:
  • Using strings instead of objects for thresholds
  • Swapping warn_after and error_after values
  • Missing count or period keys
3. Given this freshness check result output, what is the status if the last loaded timestamp is 3 days ago, warn_after is 1 day, and error_after is 2 days?
{"status": "", "max_loaded_at": "2024-04-20T00:00:00Z"}
medium
A. error
B. warn
C. pass
D. unknown

Solution

  1. Step 1: Calculate data age from last loaded timestamp

    If today is 2024-04-23, data is 3 days old (2024-04-23 - 2024-04-20).
  2. Step 2: Compare data age to thresholds

    3 days > error_after (2 days), so status is error.
  3. Final Answer:

    error -> Option A
  4. Quick Check:

    Data age > error_after = error status [OK]
Hint: If data age > error_after, status is error [OK]
Common Mistakes:
  • Confusing warn_after and error_after thresholds
  • Assuming status is warn for data older than error_after
  • Ignoring current date when calculating age
4. You wrote this freshness check YAML but it fails to run:
sources:
  - name: my_source
    freshness:
      warn_after: {count: 1, period: day}
      error_after: {count: 2, period: days}
What is the likely cause of the error?
medium
A. The count values must be strings, not numbers
B. Missing quotes around the period values
C. warn_after and error_after keys are swapped
D. The period value 'days' should be singular 'day'

Solution

  1. Step 1: Check period values in freshness YAML

    dbt expects period values as singular strings like 'day', not plural 'days'.
  2. Step 2: Identify error cause

    Using 'days' causes a validation error; changing to 'day' fixes it.
  3. Final Answer:

    The period value 'days' should be singular 'day' -> Option D
  4. Quick Check:

    Period values must be singular like 'day' [OK]
Hint: Use singular period names like 'day', not 'days' [OK]
Common Mistakes:
  • Using plural period names
  • Swapping warn_after and error_after
  • Adding unnecessary quotes around numbers
5. You want to set up a freshness check for a source table that updates hourly. You want to warn if data is older than 2 hours and error if older than 4 hours. Which YAML snippet correctly sets this up?
hard
A. freshness: warn_after: {count: '2', period: hour} error_after: {count: '4', period: hour}
B. freshness: warn_after: {count: 2, period: hours} error_after: {count: 4, period: hours}
C. freshness: warn_after: {count: 2, period: hour} error_after: {count: 4, period: hour}
D. freshness: warn_after: {count: 4, period: hour} error_after: {count: 2, period: hour}

Solution

  1. Step 1: Identify correct period and count values

    Period should be singular 'hour', counts are numbers without quotes.
  2. Step 2: Check warn_after and error_after order

    warn_after must be less than error_after; 2 < 4 is correct.
  3. Step 3: Validate options

    freshness: warn_after: {count: 2, period: hour} error_after: {count: 4, period: hour} matches correct syntax and logic; A uses strings for counts, B uses plural 'hours', D swaps thresholds.
  4. Final Answer:

    freshness: warn_after: {count: 2, period: hour} error_after: {count: 4, period: hour} -> Option C
  5. Quick Check:

    Use singular period and correct threshold order [OK]
Hint: Use singular period and warn_after < error_after [OK]
Common Mistakes:
  • Using plural period names like 'hours'
  • Putting counts as strings instead of numbers
  • Swapping warn_after and error_after values