0
0
dbtdata~3 mins

Why sources define raw data contracts in dbt - The Real Reasons

Choose your learning style9 modes available
The Big Idea

What if you could stop chasing data errors and start trusting your data every day?

The Scenario

Imagine you receive data files from different teams every day. Each file has different formats, missing columns, or unexpected changes. You try to process them manually by checking each file before using it.

The Problem

This manual checking is slow and tiring. You often miss changes or errors, causing your reports to be wrong. Fixing these mistakes later wastes time and causes frustration.

The Solution

Defining raw data contracts means setting clear rules about what the data should look like before you use it. This helps catch problems early and keeps your data clean and reliable automatically.

Before vs After
Before
if 'date' in data.columns and data['date'].notnull().all():
    process(data)
After
sources('raw_data').expect_columns(['date', 'id', 'value']).expect_not_null('date')
What It Enables

It lets you trust your data pipeline and focus on analysis, not fixing data problems.

Real Life Example

A marketing team defines a contract for customer data to ensure every record has an ID and signup date. This prevents errors in campaign reports caused by missing or wrong data.

Key Takeaways

Manual data checks are slow and error-prone.

Raw data contracts set clear expectations for incoming data.

They help catch issues early and keep data reliable.