Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Configuring sources in YAML
📖 Scenario: You are working on a data project using dbt. You want to tell dbt where your raw data lives by configuring sources in a YAML file. This helps dbt understand your data tables before you transform them.
🎯 Goal: Learn how to write a YAML configuration for a source in dbt. You will create a source with a name, specify the database and schema, and list the tables inside it.
📋 What You'll Learn
Create a YAML file with a source configuration
Define a source named raw_data
Set the database to analytics_db
Set the schema to public
Add a table named customers with a description
Add a table named orders with a description
💡 Why This Matters
🌍 Real World
In real data projects, configuring sources in YAML helps document and manage raw data tables before transforming them.
💼 Career
Data engineers and analysts use source configurations in dbt to build reliable and maintainable data pipelines.
Progress0 / 4 steps
1
Create the basic source structure
Create a YAML file and write a source configuration with the name raw_data. Set the database to analytics_db and the schema to public. Do not add any tables yet.
dbt
Hint
Start with sources: then add a list item with name, database, and schema.
2
Add the first table to the source
Add a table named customers inside the tables list of the raw_data source. Add a description: Customer details table.
dbt
Hint
Inside tables:, add a list item with name: customers and description.
3
Add the second table to the source
Add another table named orders inside the tables list of the raw_data source. Add a description: Orders placed by customers.
dbt
Hint
Add a second list item under tables: with the orders table and its description.
4
Display the complete YAML source configuration
Print the complete YAML source configuration for raw_data with both tables customers and orders and their descriptions.
dbt
Hint
Print the YAML text exactly as configured in previous steps.
Practice
(1/5)
1. What is the main purpose of configuring sources in a dbt YAML file?
easy
A. To write SQL queries for data transformation
B. To tell dbt where to find raw data tables
C. To create dashboards for data visualization
D. To schedule dbt runs automatically
Solution
Step 1: Understand the role of source configuration
Source configuration in dbt YAML files defines where raw data tables are located in the database.
Step 2: Differentiate from other dbt tasks
Writing SQL queries and scheduling runs are done elsewhere, not in source YAML files.
Final Answer:
To tell dbt where to find raw data tables -> Option B
Quick Check:
Source config = raw data location [OK]
Hint: Sources define raw table locations in YAML [OK]
Common Mistakes:
Confusing source config with SQL model code
Thinking sources schedule runs
Assuming sources create visualizations
2. Which of the following is the correct syntax to define a source in a dbt YAML file?
easy
A. source:
name: raw_data
table:
- customers
B. sources:
name: raw_data
tables:
- customers
C. sources:
- name: raw_data
tables:
- name: customers
D. source:
- raw_data:
tables:
- customers
Solution
Step 1: Recall correct YAML source structure
The correct syntax uses 'sources' as a list with 'name' and nested 'tables' list, each with a 'name'.
Step 2: Compare options to syntax
sources:
- name: raw_data
tables:
- name: customers matches the correct indentation and keys exactly.
A. 'warn_after' and 'error_after' counts are reversed
B. The indentation under 'freshness' is incorrect
C. The 'error_after' period should be less than 'warn_after'
D. The 'period' values must be singular strings
Solution
Step 1: Understand dbt freshness period syntax
dbt freshness requires singular 'period' values like 'hour', 'day', 'minute'. Plural forms ('hours', 'days') are invalid and cause errors.
Step 2: Check the YAML periods
'period: hours' and 'period: days' use plural, which dbt does not recognize.
Step 3: Rule out other options
A: Counts logical (12 hours warn before 1 day/24 hours error). B: Indentation correct. C: Incorrect--error_after time must be *longer* than warn_after.
Final Answer:
The 'period' values must be singular strings -> Option D
Quick Check:
period: hour/day (singular only) [OK]
Hint: dbt freshness periods must be singular (hour, day) [OK]
Common Mistakes:
Using plural periods ('hours', 'days')
Incorrect YAML indentation
Thinking error_after time should be shorter than warn_after
5. You want to add a test to ensure the 'email' column in the 'users' table source is never null. Which YAML snippet correctly adds this test?