0
0
DbtConceptBeginner · 3 min read

What is Source in dbt: Definition and Usage Explained

In dbt, a source is a way to declare and reference raw data tables that exist outside of your dbt models. It helps you document, test, and track these external tables within your dbt project.
⚙️

How It Works

Think of a source in dbt as a label you put on raw data tables that come from outside your dbt transformations, like tables in your data warehouse that you don't build but use. By declaring these sources, dbt knows where your raw data lives and can track it for you.

This is like having a map for your ingredients before you start cooking a recipe. You mark where the raw ingredients (data) are stored so you can use them confidently in your recipes (models). It also allows dbt to run tests on these sources to make sure the raw data is healthy before you transform it.

💻

Example

Here is a simple example of how to declare a source in dbt and reference it in a model.
yaml and sql
version: 2
sources:
  - name: raw_data
    tables:
      - name: customers
        description: "Raw customer data from the CRM system"

-- In a model file (e.g., models/my_customers.sql)
select * from {{ source('raw_data', 'customers') }}
Output
This code does not produce direct output but allows dbt to reference the 'customers' table from the 'raw_data' source in your SQL model.
🎯

When to Use

Use source declarations when you want to manage and document raw tables that your dbt models depend on but do not create. This is common when you have data landing in your warehouse from external systems like CRMs, ERPs, or logs.

Declaring sources helps you add tests to check data freshness and quality before transformations. It also improves project clarity by showing where data originates, which is useful for collaboration and debugging.

Key Points

  • Source declares external raw tables in your dbt project.
  • It helps with documentation and testing of raw data.
  • Use {{ source('source_name', 'table_name') }} to reference sources in models.
  • Sources improve data lineage and project clarity.

Key Takeaways

A source in dbt marks raw tables outside your transformations for easy reference and testing.
Declaring sources helps ensure data quality before using it in models.
Use sources to document where your raw data comes from in your project.
Refer to sources in SQL models with the source() function for clear lineage.