Introduction
Raw data contracts help teams agree on what data looks like before using it. This avoids confusion and errors later.
Jump into concepts and practice - no test required
Raw data contracts help teams agree on what data looks like before using it. This avoids confusion and errors later.
sources:
- name: raw_data_source
tables:
- name: raw_table
freshness:
warn_after: {count: 24, period: hour}
error_after: {count: 48, period: hour}
loaded_at_field: updated_at
description: "Raw data contract for source table"The sources block defines where raw data comes from.
Freshness settings help monitor if data is up-to-date.
sales_db with a table orders.sources:
- name: sales_db
tables:
- name: orders
description: "Contract for raw orders data"sources:
- name: marketing_data
tables:
- name: leads
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
loaded_at_field: load_timeThis example shows a raw data contract for an ecommerce customers table. It sets rules to warn if data is older than 6 hours and error if older than 12 hours.
version: 2 sources: - name: ecommerce_raw description: "Raw data contract for ecommerce source" tables: - name: customers description: "Customer raw data with expected fields" freshness: warn_after: {count: 6, period: hour} error_after: {count: 12, period: hour} loaded_at_field: last_updated # This YAML defines a raw data contract in dbt for the ecommerce customers table. # It helps ensure data freshness and documents expectations.
Raw data contracts are written in YAML files in dbt projects.
They help automate data quality checks and documentation.
Raw data contracts define clear expectations for source data.
They help catch data issues early and keep teams aligned.
In dbt, raw data contracts are defined using sources in YAML files.
sources: as a list with name and tables keys.sources list, name, tables, and columns with tests.sources:
- name: raw_sales
tables:
- name: transactions
columns:
- name: transaction_id
tests: [not_null, unique]
- name: amount
tests: [not_null]
What happens if a transaction has a null amount when running dbt tests?sources:
- name: raw_data
tables:
- name: customers
columns:
- name: customer_id
tests: not_null, unique
When running dbt, you get a syntax error. What is the problem?tests: not_null, unique is invalid YAML; it should be tests: [not_null, unique].