0
0
dbtdata~10 mins

Why sources define raw data contracts in dbt - Visual Breakdown

Choose your learning style9 modes available
Concept Flow - Why sources define raw data contracts
Raw Data Source
Define Raw Data Contract
Set Expectations: Schema, Types, Quality
Data Consumers Use Contract
Detect Changes or Errors Early
Maintain Data Reliability
This flow shows how defining raw data contracts sets clear rules for raw data, helping users trust and use data safely.
Execution Sample
dbt
sources:
  - name: raw_sales
    tables:
      - name: transactions
        freshness:
          warn_after:
            count: 24
            period: hour
This dbt source config defines a raw data contract for the 'transactions' table with freshness expectations.
Execution Table
StepActionEvaluationResult
1Define source 'raw_sales.transactions'Set schema and freshness rulesContract established for raw data
2Data pipeline loads raw dataCheck data against contractData matches schema and freshness
3Data consumer queries sourceUses contract to trust dataReliable data used in models
4Raw data changes unexpectedlyContract detects schema or freshness violationAlert triggered for investigation
5Fix data or update contractRestore contract complianceData reliability maintained
6EndAll checks passed or issues resolvedData pipeline stable
💡 Execution stops when data contract is either met or violations are detected and addressed
Variable Tracker
VariableStartAfter Step 2After Step 4Final
Data Contract StatusNot definedActive and validViolation detectedRestored or updated
Key Moments - 2 Insights
Why do we define a raw data contract instead of just trusting the raw data?
Defining a contract sets clear expectations for schema and freshness, so any unexpected changes or errors are caught early, as shown in step 4 of the execution table.
What happens if the raw data changes but the contract is not updated?
The contract detects violations and triggers alerts (step 4), preventing unreliable data from being used downstream.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the result after step 3?
AData consumer uses unreliable data
BContract is violated
CData consumer uses reliable data
DData pipeline stops
💡 Hint
Check the 'Result' column in row for step 3 in the execution table
At which step does the contract detect a violation?
AStep 4
BStep 2
CStep 3
DStep 5
💡 Hint
Look for 'Violation detected' in the 'Evaluation' column in the execution table
If the contract was never defined, what would be the status at 'After Step 2' in variable_tracker?
AViolation detected
BNot defined
CActive and valid
DRestored or updated
💡 Hint
Refer to the 'Data Contract Status' row in variable_tracker for the initial state
Concept Snapshot
Define raw data contracts in dbt sources to set clear rules on schema and freshness.
Contracts help detect data issues early.
They ensure data consumers trust raw data.
Violations trigger alerts for quick fixes.
Maintains overall data reliability.
Full Transcript
In dbt, defining raw data contracts means setting clear expectations for raw data sources, such as schema and freshness rules. This helps catch any unexpected changes or errors early. The flow starts with defining the contract, then loading data, checking it against the contract, and using it reliably downstream. If data changes unexpectedly, the contract detects violations and triggers alerts. Fixing data or updating the contract restores reliability. This process ensures data consumers can trust raw data and maintain stable pipelines.