We use YAML to tell dbt where to find our raw data tables. This helps dbt understand and organize the data before we work with it.
Configuring sources in YAML in dbt
Start learning this pattern below
Jump into concepts and practice - no test required
version: 2 sources: - name: source_name database: your_database schema: your_schema tables: - name: table_name description: 'Description of the table' freshness: warn_after: count: 24 period: hour tests: - unique: column_name: id - not_null: column_name: id
The version: 2 line is required for dbt to read the YAML correctly.
Indentation is important in YAML. Use 2 spaces per level.
sales_db with one table customers.version: 2 sources: - name: sales_db database: analytics schema: raw tables: - name: customers description: 'Customer details table'
version: 2 sources: - name: marketing_data database: marketing schema: public tables: - name: campaigns freshness: warn_after: count: 12 period: hour
transactions table has unique and non-null values.version: 2
sources:
- name: finance
database: finance_db
schema: reports
tables:
- name: transactions
tests:
- unique:
column_name: id
- not_null:
column_name: id
This YAML config tells dbt about the orders table in the ecommerce source. It includes a description, freshness check to warn if data is older than 6 hours, and tests to ensure data quality.
version: 2 sources: - name: ecommerce database: analytics_db schema: raw_data tables: - name: orders description: 'Raw orders data from ecommerce platform' freshness: warn_after: count: 6 period: hour tests: - unique: column_name: id - not_null: column_name: id
Always keep your YAML files well-indented to avoid errors.
Use descriptive names and descriptions to help your team understand the data.
Run dbt source freshness and dbt test to check your source configurations.
YAML files tell dbt where to find raw data tables.
You can add descriptions, freshness rules, and tests to sources.
Proper source configuration helps keep data organized and reliable.
Practice
Solution
Step 1: Understand the role of source configuration
Source configuration in dbt YAML files defines where raw data tables are located in the database.Step 2: Differentiate from other dbt tasks
Writing SQL queries and scheduling runs are done elsewhere, not in source YAML files.Final Answer:
To tell dbt where to find raw data tables -> Option BQuick Check:
Source config = raw data location [OK]
- Confusing source config with SQL model code
- Thinking sources schedule runs
- Assuming sources create visualizations
Solution
Step 1: Recall correct YAML source structure
The correct syntax uses 'sources' as a list with 'name' and nested 'tables' list, each with a 'name'.Step 2: Compare options to syntax
sources: - name: raw_data tables: - name: customers matches the correct indentation and keys exactly.Final Answer:
sources: - name: raw_data tables: - name: customers -> Option CQuick Check:
Correct YAML keys and indentation = sources: - name: raw_data tables: - name: customers [OK]
- Using singular 'source' instead of 'sources'
- Missing 'name' key for tables
- Incorrect indentation breaking YAML structure
sources:
- name: sales_data
tables:
- name: transactions
loaded_at_field: transaction_dateSolution
Step 1: Locate the 'loaded_at_field' key in YAML
It is nested under the 'transactions' table inside the 'sales_data' source.Step 2: Identify the value assigned
The value assigned to 'loaded_at_field' is 'transaction_date'.Final Answer:
transaction_date -> Option AQuick Check:
loaded_at_field value = transaction_date [OK]
- Confusing source name with field value
- Picking table name instead of field value
- Misreading YAML indentation levels
sources:
- name: marketing_data
tables:
- name: leads
freshness:
warn_after:
count: 12
period: hours
error_after:
count: 1
period: daysSolution
Step 1: Understand dbt freshness period syntax
dbt freshness requires singular 'period' values like 'hour', 'day', 'minute'. Plural forms ('hours', 'days') are invalid and cause errors.Step 2: Check the YAML periods
'period: hours' and 'period: days' use plural, which dbt does not recognize.Step 3: Rule out other options
A: Counts logical (12 hours warn before 1 day/24 hours error). B: Indentation correct. C: Incorrect--error_after time must be *longer* than warn_after.Final Answer:
The 'period' values must be singular strings -> Option DQuick Check:
period: hour/day (singular only) [OK]
- Using plural periods ('hours', 'days')
- Incorrect YAML indentation
- Thinking error_after time should be shorter than warn_after
Solution
Step 1: Recall correct test syntax in source YAML
Tests are added under 'columns' with 'name' and a 'tests' list containing test names.Step 2: Check each option's structure
sources: - name: app_data tables: - name: users columns: - name: email tests: - not_null correctly uses 'columns' list with 'name' and 'tests' list containing 'not_null'.Step 3: Identify errors in other options
sources: - name: app_data tables: - name: users tests: - column: email test: not_null uses wrong keys, sources: - name: app_data tables: - users: columns: - email: tests: - not_null has wrong nesting, sources: - name: app_data tables: - name: users columns: - email test: not_null uses 'test' instead of 'tests'.Final Answer:
sources: - name: app_data tables: - name: users columns: - name: email tests: - not_null -> Option AQuick Check:
Tests under columns with 'tests' list = sources: - name: app_data tables: - name: users columns: - name: email tests: - not_null [OK]
- Using 'test' instead of 'tests'
- Wrong nesting of columns and tests
- Misnaming keys like 'column' instead of 'name'
