Understanding Why Sources Define Raw Data Contracts in dbt
📖 Scenario: Imagine you work in a company where multiple teams provide data to a central data warehouse. Each team sends raw data files with different formats and quality. To keep the data clean and reliable, your team uses dbt to manage data transformations and ensure everyone agrees on the data format.
🎯 Goal: You will create a simple example to understand why defining sources in dbt acts as a raw data contract. This contract helps your team know what raw data to expect and how to check it before using it in reports.
📋 What You'll Learn
Create a dictionary called
raw_data_sources with exact source names and their expected columnsCreate a variable called
required_columns listing columns that must be presentWrite a loop using
for source, columns in raw_data_sources.items() to check if required columns existPrint the results showing which sources meet the raw data contract
💡 Why This Matters
🌍 Real World
In real companies, raw data comes from many places. Defining sources as contracts helps data teams trust and use data safely.
💼 Career
Data engineers and analysts use raw data contracts in dbt to ensure data quality and avoid errors in reports and dashboards.
Progress0 / 4 steps