What is Source in dbt: Definition and Usage Explained
source is a way to declare and reference raw data tables that exist outside of your dbt models. It helps you document, test, and track these external tables within your dbt project.How It Works
Think of a source in dbt as a label you put on raw data tables that come from outside your dbt transformations, like tables in your data warehouse that you don't build but use. By declaring these sources, dbt knows where your raw data lives and can track it for you.
This is like having a map for your ingredients before you start cooking a recipe. You mark where the raw ingredients (data) are stored so you can use them confidently in your recipes (models). It also allows dbt to run tests on these sources to make sure the raw data is healthy before you transform it.
Example
version: 2 sources: - name: raw_data tables: - name: customers description: "Raw customer data from the CRM system" -- In a model file (e.g., models/my_customers.sql) select * from {{ source('raw_data', 'customers') }}
When to Use
Use source declarations when you want to manage and document raw tables that your dbt models depend on but do not create. This is common when you have data landing in your warehouse from external systems like CRMs, ERPs, or logs.
Declaring sources helps you add tests to check data freshness and quality before transformations. It also improves project clarity by showing where data originates, which is useful for collaboration and debugging.
Key Points
- Source declares external raw tables in your dbt project.
- It helps with documentation and testing of raw data.
- Use
{{ source('source_name', 'table_name') }}to reference sources in models. - Sources improve data lineage and project clarity.