dbtdata~15 mins

Source freshness checks in dbt - Deep Dive

Choose your learning style10 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Overview - Source freshness checks

What is it?

Source freshness checks are a way to monitor how up-to-date your data sources are. They help you know if the data you use in your analysis or reports is fresh or outdated. This is done by checking timestamps or update times on the original data before it enters your data models. It ensures you trust the data's timeliness.

Why it matters

Without source freshness checks, you might make decisions based on old or stale data without realizing it. This can lead to wrong conclusions or missed opportunities. Freshness checks give you confidence that your data reflects the latest information, which is crucial for accurate analysis and business decisions.

Where it fits

Before learning source freshness checks, you should understand basic dbt concepts like models, sources, and tests. After mastering freshness checks, you can explore advanced data quality monitoring and alerting systems to automate data reliability.

Mental Model

Core Idea

Source freshness checks measure how recent the data in your source tables is to ensure your analysis uses up-to-date information.

Think of it like...

It's like checking the expiration date on food before eating it to make sure it's still good and safe.

┌───────────────────────────────┐
│        Source Table           │
│  ┌─────────────────────────┐  │
│  │ Timestamp of last update│  │
│  └─────────────────────────┘  │
│               │               │
│               ▼               │
│  Freshness Check compares     │
│  current time with timestamp  │
│               │               │
│               ▼               │
│  Result: Fresh or Stale       │
└───────────────────────────────┘

Build-Up - 6 Steps

FoundationUnderstanding data freshness basics

Concept: Learn what data freshness means and why it matters in data analysis.

Data freshness refers to how recently data was updated or loaded. Fresh data means it reflects the latest changes, while stale data is old and may not be reliable. For example, a sales report using yesterday's data might miss today's sales. Freshness is often tracked using timestamps that record when data was last updated.

Result

You understand that freshness is about data recency and its impact on trustworthiness.

Knowing what freshness means helps you realize why checking data update times is important before analysis.

FoundationIntroduction to dbt sources and timestamps

IntermediateConfiguring freshness checks in dbt

IntermediateInterpreting freshness check results

AdvancedAutomating freshness checks in CI/CD pipelines

ExpertHandling complex freshness scenarios and limitations

Under the Hood

dbt runs a query on the source table to find the maximum value of the timestamp column specified for freshness. It then compares this timestamp to the current system time. Based on configured thresholds, dbt categorizes the freshness status. This process happens at runtime when you execute 'dbt source freshness'.

Why designed this way?

This design uses existing data timestamps to avoid extra overhead or complex tracking systems. It leverages the database's native capabilities for efficient checks. Alternatives like event-based freshness tracking are more complex and less portable across systems.

┌───────────────┐      ┌─────────────────────┐      ┌─────────────┐
│ Source Table  │─────▶│ Query max(timestamp) │─────▶│ Compare to  │
│ with updated  │      │                     │      │ current time│
│ timestamp col │      └─────────────────────┘      └─────────────┘
│               │                                      │
└───────────────┘                                      ▼
                                                  ┌─────────────┐
                                                  │ Freshness   │
                                                  │ Status:     │
                                                  │ pass/warn/  │
                                                  │ error       │
                                                  └─────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does a passing freshness check guarantee the data is accurate? Commit yes or no.

Common Belief:If freshness checks pass, the data is always correct and reliable.

Tap to reveal reality

Quick: Do freshness checks run automatically without setup in dbt? Commit yes or no.

Common Belief:dbt automatically checks freshness for all sources without extra configuration.

Tap to reveal reality

Quick: Can freshness checks detect data errors like wrong values? Commit yes or no.

Common Belief:Freshness checks will catch any data errors or inconsistencies.

Tap to reveal reality

Quick: Does timezone difference never affect freshness check results? Commit yes or no.

Common Belief:Timezone differences do not impact freshness check accuracy.

Tap to reveal reality

Expert Zone

Freshness thresholds should reflect business context; some data can tolerate longer delays without harm.

Combining freshness checks with other tests like uniqueness or null checks provides a fuller data quality picture.

Handling timezone-aware timestamps correctly avoids false positives in freshness alerts, especially in global systems.

When NOT to use

Do not rely on freshness checks alone when data correctness or completeness is critical; use additional data quality tests and validation tools. For event-driven or streaming data, consider real-time monitoring systems instead.

Production Patterns

In production, teams integrate freshness checks into nightly batch jobs and alerting systems. They often combine freshness with SLA monitoring and use dashboards to track data health over time.

Connections

Data Quality Testing

Builds-on

Understanding freshness checks helps grasp broader data quality testing, which includes accuracy, completeness, and consistency.

Continuous Integration/Continuous Deployment (CI/CD)

Same pattern

Automating freshness checks in CI/CD pipelines applies the same principles of automated testing and deployment used in software engineering.

Supply Chain Inventory Management

Analogous process

Just like freshness checks ensure data is up-to-date, inventory management tracks stock freshness to avoid selling expired goods, showing a cross-domain pattern of freshness monitoring.

Common Pitfalls

#1Not configuring freshness checks in dbt source files.

Wrong approach:sources: - name: sales_data tables: - name: daily_sales # Missing freshness block here

Correct approach:sources: - name: sales_data tables: - name: daily_sales freshness: warn_after: {count: 1, period: hour} error_after: {count: 2, period: hour} loaded_at_field: updated_at

Root cause:Assuming freshness checks run automatically without explicit configuration.

#2Using incorrect timestamp column for freshness check.

Wrong approach:freshness: loaded_at_field: created_date # This column does not update on data refresh

Correct approach:freshness: loaded_at_field: updated_at # Correct column that reflects last update time

Root cause:Confusing creation time with last update time, leading to stale data not detected.

#3Ignoring timezone differences in timestamp comparisons.

Wrong approach:# Timestamps stored in UTC but system time local freshness: loaded_at_field: updated_at warn_after: {count: 1, period: hour} error_after: {count: 2, period: hour}

Correct approach:# Ensure timestamps and system time use same timezone or convert # Handle timezone in source or dbt config explicitly

Root cause:Not accounting for timezone mismatch causes false freshness alerts.

Key Takeaways

Source freshness checks verify how recent your data is to ensure timely and reliable analysis.

In dbt, freshness checks require explicit configuration specifying which timestamp column to use and freshness thresholds.

Freshness checks only measure data recency, not correctness or completeness, so they should be combined with other data quality tests.

Automating freshness checks in pipelines helps catch stale data early and maintain trust in your data systems.

Understanding limitations like timezone issues and irregular updates prevents false alerts and improves monitoring accuracy.

Practice

(1/5)

1. What is the main purpose of source freshness checks in dbt?

easy

A. To track how recent the data in your source tables is

B. To create new tables from raw data

C. To optimize SQL query performance

D. To schedule dbt runs automatically

Source freshness checks in dbt - Deep Dive

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of freshness checks

Step 2: Compare options to the purpose

Final Answer:

Quick Check:

Solution

Step 1: Recall correct YAML syntax for freshness

Step 2: Match options to syntax

Final Answer:

Quick Check:

Solution

Step 1: Calculate data age from last loaded timestamp

Step 2: Compare data age to thresholds

Final Answer:

Quick Check:

Solution

Step 1: Check period values in freshness YAML

Step 2: Identify error cause

Final Answer:

Quick Check:

Solution

Step 1: Identify correct period and count values

Step 2: Check warn_after and error_after order

Step 3: Validate options

Final Answer:

Quick Check: