dbtdata~10 mins

Why sources define raw data contracts in dbt - Visual Breakdown

Choose your learning style10 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Concept Flow - Why sources define raw data contracts

Raw Data Source

↓

Define Raw Data Contract

↓

Set Expectations: Schema, Types, Quality

↓

Data Consumers Use Contract

↓

Detect Changes or Errors Early

↓

Maintain Data Reliability

This flow shows how defining raw data contracts sets clear rules for raw data, helping users trust and use data safely.

Execution Sample

dbt

sources:
  - name: raw_sales
    tables:
      - name: transactions
        freshness:
          warn_after:
            count: 24
            period: hour

This dbt source config defines a raw data contract for the 'transactions' table with freshness expectations.

Execution Table

Step	Action	Evaluation	Result
1	Define source 'raw_sales.transactions'	Set schema and freshness rules	Contract established for raw data
2	Data pipeline loads raw data	Check data against contract	Data matches schema and freshness
3	Data consumer queries source	Uses contract to trust data	Reliable data used in models
4	Raw data changes unexpectedly	Contract detects schema or freshness violation	Alert triggered for investigation
5	Fix data or update contract	Restore contract compliance	Data reliability maintained
6	End	All checks passed or issues resolved	Data pipeline stable

💡 Execution stops when data contract is either met or violations are detected and addressed

Variable Tracker

Variable	Start	After Step 2	After Step 4	Final
Data Contract Status	Not defined	Active and valid	Violation detected	Restored or updated

Key Moments - 2 Insights

Why do we define a raw data contract instead of just trusting the raw data?

What happens if the raw data changes but the contract is not updated?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table, what is the result after step 3?

AData consumer uses unreliable data

BContract is violated

CData consumer uses reliable data

DData pipeline stops

Concept Snapshot

Define raw data contracts in dbt sources to set clear rules on schema and freshness.
Contracts help detect data issues early.
They ensure data consumers trust raw data.
Violations trigger alerts for quick fixes.
Maintains overall data reliability.

Full Transcript

In dbt, defining raw data contracts means setting clear expectations for raw data sources, such as schema and freshness rules. This helps catch any unexpected changes or errors early. The flow starts with defining the contract, then loading data, checking it against the contract, and using it reliably downstream. If data changes unexpectedly, the contract detects violations and triggers alerts. Fixing data or updating the contract restores reliability. This process ensures data consumers can trust raw data and maintain stable pipelines.

Practice

(1/5)

1. Why do we define raw data contracts in dbt sources?

easy

A. To set clear expectations for the raw data coming into the system

B. To speed up the data loading process

C. To automatically fix data errors

D. To create visual reports from raw data

Why sources define raw data contracts in dbt - Visual Breakdown

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of raw data contracts

Step 2: Identify the main benefit in dbt context

Final Answer:

Quick Check:

Solution

Step 1: Recall dbt source YAML structure

Step 2: Match correct indentation and keys

Final Answer:

Quick Check:

Solution

Step 1: Understand the 'not_null' test in dbt

Step 2: Predict test behavior on null data

Final Answer:

Quick Check:

Solution

Step 1: Check YAML syntax for tests

Step 2: Identify the error in tests format

Final Answer:

Quick Check:

Solution

Step 1: Identify required tests for 'order_id'

Step 2: Define tests for 'order_date'

Step 3: Combine tests in source YAML

Final Answer:

Quick Check: