What if you could fix all your data source errors by changing just one simple file?
Configuring sources in YAML in dbt - Why You Should Know This
Start learning this pattern below
Jump into concepts and practice - no test required
Imagine you have many data tables from different places, and you need to tell your project where each table lives by writing long lists of details in many places.
Manually tracking each data source in code is slow and confusing. You might forget to update a table name or location, causing errors that are hard to find.
Using YAML to configure sources lets you keep all source details in one clear, simple file. This makes it easy to update and reuse source info without mistakes.
SELECT * FROM database.schema.table_name; -- Hard to track and update source details scattered in SQL
sources:
- name: my_source
tables:
- name: table_name
-- Reference source in models with {{ source('my_source', 'table_name') }}It enables clean, centralized source management that makes your data projects easier to maintain and less error-prone.
A data analyst working on sales reports can update the source location in one YAML file, and all reports automatically use the new data without changing each SQL query.
Manual source tracking is slow and error-prone.
YAML centralizes source info in one place.
This makes data projects easier to update and maintain.
Practice
Solution
Step 1: Understand the role of source configuration
Source configuration in dbt YAML files defines where raw data tables are located in the database.Step 2: Differentiate from other dbt tasks
Writing SQL queries and scheduling runs are done elsewhere, not in source YAML files.Final Answer:
To tell dbt where to find raw data tables -> Option BQuick Check:
Source config = raw data location [OK]
- Confusing source config with SQL model code
- Thinking sources schedule runs
- Assuming sources create visualizations
Solution
Step 1: Recall correct YAML source structure
The correct syntax uses 'sources' as a list with 'name' and nested 'tables' list, each with a 'name'.Step 2: Compare options to syntax
sources: - name: raw_data tables: - name: customers matches the correct indentation and keys exactly.Final Answer:
sources: - name: raw_data tables: - name: customers -> Option CQuick Check:
Correct YAML keys and indentation = sources: - name: raw_data tables: - name: customers [OK]
- Using singular 'source' instead of 'sources'
- Missing 'name' key for tables
- Incorrect indentation breaking YAML structure
sources:
- name: sales_data
tables:
- name: transactions
loaded_at_field: transaction_dateSolution
Step 1: Locate the 'loaded_at_field' key in YAML
It is nested under the 'transactions' table inside the 'sales_data' source.Step 2: Identify the value assigned
The value assigned to 'loaded_at_field' is 'transaction_date'.Final Answer:
transaction_date -> Option AQuick Check:
loaded_at_field value = transaction_date [OK]
- Confusing source name with field value
- Picking table name instead of field value
- Misreading YAML indentation levels
sources:
- name: marketing_data
tables:
- name: leads
freshness:
warn_after:
count: 12
period: hours
error_after:
count: 1
period: daysSolution
Step 1: Understand dbt freshness period syntax
dbt freshness requires singular 'period' values like 'hour', 'day', 'minute'. Plural forms ('hours', 'days') are invalid and cause errors.Step 2: Check the YAML periods
'period: hours' and 'period: days' use plural, which dbt does not recognize.Step 3: Rule out other options
A: Counts logical (12 hours warn before 1 day/24 hours error). B: Indentation correct. C: Incorrect--error_after time must be *longer* than warn_after.Final Answer:
The 'period' values must be singular strings -> Option DQuick Check:
period: hour/day (singular only) [OK]
- Using plural periods ('hours', 'days')
- Incorrect YAML indentation
- Thinking error_after time should be shorter than warn_after
Solution
Step 1: Recall correct test syntax in source YAML
Tests are added under 'columns' with 'name' and a 'tests' list containing test names.Step 2: Check each option's structure
sources: - name: app_data tables: - name: users columns: - name: email tests: - not_null correctly uses 'columns' list with 'name' and 'tests' list containing 'not_null'.Step 3: Identify errors in other options
sources: - name: app_data tables: - name: users tests: - column: email test: not_null uses wrong keys, sources: - name: app_data tables: - users: columns: - email: tests: - not_null has wrong nesting, sources: - name: app_data tables: - name: users columns: - email test: not_null uses 'test' instead of 'tests'.Final Answer:
sources: - name: app_data tables: - name: users columns: - name: email tests: - not_null -> Option AQuick Check:
Tests under columns with 'tests' list = sources: - name: app_data tables: - name: users columns: - name: email tests: - not_null [OK]
- Using 'test' instead of 'tests'
- Wrong nesting of columns and tests
- Misnaming keys like 'column' instead of 'name'
