Challenge - 5 Problems
Source Freshness Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of freshness check query
Given the following SQL query used in a dbt source freshness check, what will be the output if the source table's max timestamp is '2024-06-01 10:00:00' and the current timestamp is '2024-06-01 12:00:00'?
dbt
SELECT MAX(updated_at) AS last_loaded, CURRENT_TIMESTAMP AS check_time, TIMESTAMPDIFF(MINUTE, MAX(updated_at), CURRENT_TIMESTAMP) AS minutes_behind FROM source_schema.source_table;
Attempts:
2 left
💡 Hint
Think about how TIMESTAMPDIFF calculates the difference between two timestamps.
✗ Incorrect
The query calculates the maximum updated_at timestamp from the source table and compares it to the current timestamp. Since the max updated_at is 2 hours before current time, minutes_behind is 120.
🧠 Conceptual
intermediate1:30remaining
Understanding freshness thresholds
In dbt source freshness checks, what does the 'warn_after' threshold represent?
Attempts:
2 left
💡 Hint
Think about what triggers a warning in freshness checks.
✗ Incorrect
'warn_after' sets the time limit after which dbt will warn that the source data might be stale.
🔧 Debug
advanced2:30remaining
Identify the error in freshness SQL
This SQL snippet is used in a dbt freshness check but raises an error. What is the cause?
SELECT MAX(last_updated) AS last_loaded,
CURRENT_TIMESTAMP AS check_time,
DATEDIFF(MINUTES, MAX(last_updated), CURRENT_TIMESTAMP) AS minutes_behind
FROM source_schema.source_table;
Attempts:
2 left
💡 Hint
Check the syntax and parameters of the DATEDIFF function.
✗ Incorrect
DATEDIFF usually accepts 'DAY' or similar units, but 'MINUTES' is not valid in many SQL dialects. TIMESTAMPDIFF is preferred for minutes.
❓ data_output
advanced2:00remaining
Result of freshness check with null timestamps
What will be the output of this freshness check query if the source table has no rows (empty)?
SELECT
MAX(updated_at) AS last_loaded,
CURRENT_TIMESTAMP AS check_time,
TIMESTAMPDIFF(MINUTE, MAX(updated_at), CURRENT_TIMESTAMP) AS minutes_behind
FROM source_schema.source_table;
Attempts:
2 left
💡 Hint
Consider how aggregate functions behave on empty tables.
✗ Incorrect
MAX on an empty column returns null, so minutes_behind calculation also results in null.
🚀 Application
expert3:00remaining
Designing a freshness check for multiple sources
You have three source tables with different update frequencies. You want to configure dbt freshness checks with these thresholds:
- source A: warn if data is older than 30 minutes, error if older than 1 hour
- source B: warn if data is older than 2 hours, error if older than 4 hours
- source C: warn if data is older than 1 day, error if older than 2 days
Which YAML configuration snippet correctly sets these freshness checks?
dbt
version: 2 sources: - name: source_a freshness: warn_after: {count: 30, period: minute} error_after: {count: 1, period: hour} - name: source_b freshness: warn_after: {count: 2, period: hour} error_after: {count: 4, period: hour} - name: source_c freshness: warn_after: {count: 1, period: day} error_after: {count: 2, period: day}
Attempts:
2 left
💡 Hint
Check that warn_after is less than error_after and units match the requirements.
✗ Incorrect
Option A correctly sets warn_after and error_after with proper counts and periods matching the problem statement.