Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Source Freshness Checks with dbt
📖 Scenario: You work as a data analyst in a company that relies on daily data updates from various sources. To ensure your reports are accurate, you need to check how fresh the data is in your source tables.Using dbt, you will set up source freshness checks to monitor the last update time of your data sources.
🎯 Goal: Build a dbt project that defines sources and configures freshness checks to monitor the last update timestamps of source tables.
📋 What You'll Learn
Create a source definition for a table named orders in the raw schema
Add a freshness check configuration with warn_after and error_after thresholds
Run the freshness check and output the results
💡 Why This Matters
🌍 Real World
Data teams use source freshness checks to ensure their reports and models rely on up-to-date data, preventing decisions based on stale information.
💼 Career
Knowing how to configure and run source freshness checks is essential for data analysts and engineers to maintain data quality and trust in analytics pipelines.
Progress0 / 4 steps
1
Define the source for the orders table
Create a source named raw with a table called orders. The table has a freshness timestamp column named last_updated. Write this in a YAML file named sources.yml.
dbt
Hint
Use the sources key and define name, tables, and freshness with warn_after and error_after thresholds.
2
Configure freshness thresholds
In the sources.yml file, set the freshness check to warn if data is older than 24 hours and error if older than 48 hours. Use warn_after and error_after with count and period keys.
dbt
Hint
Use warn_after and error_after with count and period to set thresholds.
3
Run the freshness check in dbt
Run the dbt command to check source freshness using dbt source freshness in your terminal.
dbt
Hint
Use the command dbt source freshness to run freshness checks.
4
Display the freshness check results
After running dbt source freshness, print the output showing the freshness status of the orders source table.
dbt
Hint
Look at the terminal output after running dbt source freshness to see the freshness status.
Practice
(1/5)
1. What is the main purpose of source freshness checks in dbt?
easy
A. To track how recent the data in your source tables is
B. To create new tables from raw data
C. To optimize SQL query performance
D. To schedule dbt runs automatically
Solution
Step 1: Understand the role of freshness checks
Freshness checks monitor the age of data in source tables to ensure it is up-to-date.
Step 2: Compare options to the purpose
Only To track how recent the data in your source tables is describes tracking data recency, which matches the purpose of freshness checks.
Final Answer:
To track how recent the data in your source tables is -> Option A
Quick Check:
Freshness checks = track data recency [OK]
Hint: Freshness checks measure data age, not table creation or scheduling [OK]
Common Mistakes:
Confusing freshness checks with table creation
Thinking freshness checks optimize queries
Assuming freshness checks schedule runs
2. Which of the following is the correct way to set a freshness check with a warning threshold of 1 day and an error threshold of 2 days in dbt YAML?
easy
A. freshness:
warn_after: 1 day
error_after: 2 day
Hint: Use {count: X, period: day} format for freshness thresholds [OK]
Common Mistakes:
Using strings instead of objects for thresholds
Swapping warn_after and error_after values
Missing count or period keys
3. Given this freshness check result output, what is the status if the last loaded timestamp is 3 days ago, warn_after is 1 day, and error_after is 2 days?
D. The period value 'days' should be singular 'day'
Solution
Step 1: Check period values in freshness YAML
dbt expects period values as singular strings like 'day', not plural 'days'.
Step 2: Identify error cause
Using 'days' causes a validation error; changing to 'day' fixes it.
Final Answer:
The period value 'days' should be singular 'day' -> Option D
Quick Check:
Period values must be singular like 'day' [OK]
Hint: Use singular period names like 'day', not 'days' [OK]
Common Mistakes:
Using plural period names
Swapping warn_after and error_after
Adding unnecessary quotes around numbers
5. You want to set up a freshness check for a source table that updates hourly. You want to warn if data is older than 2 hours and error if older than 4 hours. Which YAML snippet correctly sets this up?