Overview - Why sources define raw data contracts
What is it?
Raw data contracts are agreements that define the exact shape, quality, and expectations of data coming from a source system before it is processed. In dbt, sources define these contracts to ensure that the data entering the transformation pipeline meets certain standards. This helps teams catch errors early and maintain trust in their data. Essentially, it is a way to say, 'This is what the raw data should look like before we start working with it.'
Why it matters
Without raw data contracts, teams risk working with unexpected or broken data, which can cause errors downstream and lead to wrong decisions. Defining these contracts helps catch problems early, saving time and effort. It also creates clear communication between data producers and consumers, making data pipelines more reliable and easier to maintain. Without this, data teams would spend more time fixing issues than analyzing data.
Where it fits
Before learning about raw data contracts, you should understand basic data modeling and dbt sources. After this, you can learn about data testing, data quality frameworks, and advanced dbt features like snapshots and exposures. This topic sits at the start of the data transformation journey, focusing on input validation.