0
0
dbtdata~5 mins

Configuring sources in YAML in dbt

Choose your learning style9 modes available
Introduction

We use YAML to tell dbt where to find our raw data tables. This helps dbt understand and organize the data before we work with it.

When you want to tell dbt about the raw tables in your database.
When you need to document your data sources clearly for your team.
When you want to track freshness or test the quality of your source data.
When you want to reference raw tables in your dbt models easily.
Syntax
dbt
version: 2
sources:
  - name: source_name
    database: your_database
    schema: your_schema
    tables:
      - name: table_name
        description: 'Description of the table'
        freshness:
          warn_after:
            count: 24
            period: hour
        tests:
          - unique:
              column_name: id
          - not_null:
              column_name: id

The version: 2 line is required for dbt to read the YAML correctly.

Indentation is important in YAML. Use 2 spaces per level.

Examples
This example defines a source named sales_db with one table customers.
dbt
version: 2
sources:
  - name: sales_db
    database: analytics
    schema: raw
    tables:
      - name: customers
        description: 'Customer details table'
This example adds freshness settings to warn if data is older than 12 hours.
dbt
version: 2
sources:
  - name: marketing_data
    database: marketing
    schema: public
    tables:
      - name: campaigns
        freshness:
          warn_after:
            count: 12
            period: hour
This example adds tests to check that the transactions table has unique and non-null values.
dbt
version: 2
sources:
  - name: finance
    database: finance_db
    schema: reports
    tables:
      - name: transactions
        tests:
          - unique:
              column_name: id
          - not_null:
              column_name: id
Sample Program

This YAML config tells dbt about the orders table in the ecommerce source. It includes a description, freshness check to warn if data is older than 6 hours, and tests to ensure data quality.

dbt
version: 2
sources:
  - name: ecommerce
    database: analytics_db
    schema: raw_data
    tables:
      - name: orders
        description: 'Raw orders data from ecommerce platform'
        freshness:
          warn_after:
            count: 6
            period: hour
        tests:
          - unique:
              column_name: id
          - not_null:
              column_name: id
OutputSuccess
Important Notes

Always keep your YAML files well-indented to avoid errors.

Use descriptive names and descriptions to help your team understand the data.

Run dbt source freshness and dbt test to check your source configurations.

Summary

YAML files tell dbt where to find raw data tables.

You can add descriptions, freshness rules, and tests to sources.

Proper source configuration helps keep data organized and reliable.