Bird
Raised Fist0
dbtdata~10 mins

Configuring sources in YAML in dbt - Visual Walkthrough

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Concept Flow - Configuring sources in YAML
Start YAML file
Define 'sources' key
Add source name
Add tables under source
Specify table details (name, description)
Save YAML
dbt reads source config
Sources available for models
This flow shows how to write a YAML file to define data sources and tables for dbt to use in models.
Execution Sample
dbt
sources:
  - name: raw_data
    tables:
      - name: users
        description: 'User data from app'
Defines a source named 'raw_data' with a table 'users' and a description.
Execution Table
StepYAML LineActionState ChangeResult
1sources:Start defining sourcesCreate 'sources' keyEmpty list for sources
2- name: raw_dataAdd source nameAppend source dict with name 'raw_data'sources = [{'name': 'raw_data'}]
3tables:Add tables keyAdd empty 'tables' list to sourcesources[0]['tables'] = []
4- name: usersAdd table nameAppend table dict with name 'users'sources[0]['tables'] = [{'name': 'users'}]
5description: 'User data from app'Add descriptionAdd description to table dictsources[0]['tables'][0]['description'] = 'User data from app'
6End of YAMLFinish parsingYAML fully parsedSource config ready for dbt
💡 Reached end of YAML file, source configuration complete.
Variable Tracker
VariableStartAfter Step 2After Step 3After Step 4After Step 5Final
sourcesundefined[{'name': 'raw_data'}][{'name': 'raw_data', 'tables': []}][{'name': 'raw_data', 'tables': [{'name': 'users'}]}][{'name': 'raw_data', 'tables': [{'name': 'users', 'description': 'User data from app'}]}][{'name': 'raw_data', 'tables': [{'name': 'users', 'description': 'User data from app'}]}]
Key Moments - 2 Insights
Why do we indent 'tables' under the source name?
Because YAML uses indentation to show hierarchy. 'tables' belongs to the source 'raw_data', so it must be indented under it as shown in step 3 of the execution table.
What happens if we forget to add a description for a table?
The table will still be recognized by dbt, but it won't have a description metadata. Step 5 shows adding description is optional but helpful.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the state of 'sources' after step 4?
A[{'name': 'raw_data', 'tables': []}]
B[{'name': 'raw_data', 'tables': [{'name': 'users'}]}]
C[{'name': 'raw_data'}]
Dundefined
💡 Hint
Check the 'State Change' column at step 4 in the execution table.
At which step is the table description added?
AStep 3
BStep 2
CStep 5
DStep 6
💡 Hint
Look for the 'Add description' action in the execution table.
If we remove the 'tables' key, what would happen to the source configuration?
Adbt will have no tables under the source, so no tables to reference.
BThe source will be invalid and cause an error.
CThe source will automatically add default tables.
DThe source will be ignored completely.
💡 Hint
Think about the role of 'tables' in the variable_tracker and execution_table.
Concept Snapshot
Configuring sources in YAML for dbt:
- Use 'sources:' at top level
- Define each source with '- name:'
- Under each source, add 'tables:' list
- Each table has '- name:' and optional 'description:'
- Indentation shows hierarchy
- dbt reads this to know where data comes from
Full Transcript
This visual execution shows how to configure sources in YAML for dbt. We start by creating a 'sources' key, then add a source name. Under that source, we add a 'tables' list. Each table has a name and can have a description. Indentation is important to show the structure. The execution table traces each step of parsing the YAML lines and how the internal data structure changes. The variable tracker shows how the 'sources' variable builds up step by step. Key moments clarify why indentation matters and the role of descriptions. The quiz tests understanding of the state after steps and the effect of missing keys. This helps beginners see exactly how dbt reads source configs from YAML.

Practice

(1/5)
1. What is the main purpose of configuring sources in a dbt YAML file?
easy
A. To write SQL queries for data transformation
B. To tell dbt where to find raw data tables
C. To create dashboards for data visualization
D. To schedule dbt runs automatically

Solution

  1. Step 1: Understand the role of source configuration

    Source configuration in dbt YAML files defines where raw data tables are located in the database.
  2. Step 2: Differentiate from other dbt tasks

    Writing SQL queries and scheduling runs are done elsewhere, not in source YAML files.
  3. Final Answer:

    To tell dbt where to find raw data tables -> Option B
  4. Quick Check:

    Source config = raw data location [OK]
Hint: Sources define raw table locations in YAML [OK]
Common Mistakes:
  • Confusing source config with SQL model code
  • Thinking sources schedule runs
  • Assuming sources create visualizations
2. Which of the following is the correct syntax to define a source in a dbt YAML file?
easy
A. source: name: raw_data table: - customers
B. sources: name: raw_data tables: - customers
C. sources: - name: raw_data tables: - name: customers
D. source: - raw_data: tables: - customers

Solution

  1. Step 1: Recall correct YAML source structure

    The correct syntax uses 'sources' as a list with 'name' and nested 'tables' list, each with a 'name'.
  2. Step 2: Compare options to syntax

    sources: - name: raw_data tables: - name: customers matches the correct indentation and keys exactly.
  3. Final Answer:

    sources: - name: raw_data tables: - name: customers -> Option C
  4. Quick Check:

    Correct YAML keys and indentation = sources: - name: raw_data tables: - name: customers [OK]
Hint: Look for 'sources' list with 'name' and 'tables' keys [OK]
Common Mistakes:
  • Using singular 'source' instead of 'sources'
  • Missing 'name' key for tables
  • Incorrect indentation breaking YAML structure
3. Given this YAML snippet, what is the value of the 'loaded_at_field' for the source 'sales_data'?
sources:
  - name: sales_data
    tables:
      - name: transactions
        loaded_at_field: transaction_date
medium
A. transaction_date
B. transactions
C. loaded_at_field
D. sales_data

Solution

  1. Step 1: Locate the 'loaded_at_field' key in YAML

    It is nested under the 'transactions' table inside the 'sales_data' source.
  2. Step 2: Identify the value assigned

    The value assigned to 'loaded_at_field' is 'transaction_date'.
  3. Final Answer:

    transaction_date -> Option A
  4. Quick Check:

    loaded_at_field value = transaction_date [OK]
Hint: Find 'loaded_at_field' key's value under table [OK]
Common Mistakes:
  • Confusing source name with field value
  • Picking table name instead of field value
  • Misreading YAML indentation levels
4. Identify the error in this source configuration YAML:
sources:
  - name: marketing_data
    tables:
      - name: leads
        freshness:
          warn_after:
            count: 12
            period: hours
          error_after:
            count: 1
            period: days
medium
A. 'warn_after' and 'error_after' counts are reversed
B. The indentation under 'freshness' is incorrect
C. The 'error_after' period should be less than 'warn_after'
D. The 'period' values must be singular strings

Solution

  1. Step 1: Understand dbt freshness period syntax

    dbt freshness requires singular 'period' values like 'hour', 'day', 'minute'. Plural forms ('hours', 'days') are invalid and cause errors.
  2. Step 2: Check the YAML periods

    'period: hours' and 'period: days' use plural, which dbt does not recognize.
  3. Step 3: Rule out other options

    A: Counts logical (12 hours warn before 1 day/24 hours error). B: Indentation correct. C: Incorrect--error_after time must be *longer* than warn_after.
  4. Final Answer:

    The 'period' values must be singular strings -> Option D
  5. Quick Check:

    period: hour/day (singular only) [OK]
Hint: dbt freshness periods must be singular (hour, day) [OK]
Common Mistakes:
  • Using plural periods ('hours', 'days')
  • Incorrect YAML indentation
  • Thinking error_after time should be shorter than warn_after
5. You want to add a test to ensure the 'email' column in the 'users' table source is never null. Which YAML snippet correctly adds this test?
hard
A. sources: - name: app_data tables: - name: users columns: - name: email tests: - not_null
B. sources: - name: app_data tables: - name: users tests: - column: email test: not_null
C. sources: - name: app_data tables: - users: columns: - email: tests: - not_null
D. sources: - name: app_data tables: - name: users columns: - email test: not_null

Solution

  1. Step 1: Recall correct test syntax in source YAML

    Tests are added under 'columns' with 'name' and a 'tests' list containing test names.
  2. Step 2: Check each option's structure

    sources: - name: app_data tables: - name: users columns: - name: email tests: - not_null correctly uses 'columns' list with 'name' and 'tests' list containing 'not_null'.
  3. Step 3: Identify errors in other options

    sources: - name: app_data tables: - name: users tests: - column: email test: not_null uses wrong keys, sources: - name: app_data tables: - users: columns: - email: tests: - not_null has wrong nesting, sources: - name: app_data tables: - name: users columns: - email test: not_null uses 'test' instead of 'tests'.
  4. Final Answer:

    sources: - name: app_data tables: - name: users columns: - name: email tests: - not_null -> Option A
  5. Quick Check:

    Tests under columns with 'tests' list = sources: - name: app_data tables: - name: users columns: - name: email tests: - not_null [OK]
Hint: Tests go under columns with 'tests' list [OK]
Common Mistakes:
  • Using 'test' instead of 'tests'
  • Wrong nesting of columns and tests
  • Misnaming keys like 'column' instead of 'name'