Bird
Raised Fist0
dbtdata~5 mins

Configuring sources in YAML in dbt - Quick Revision & Summary

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is the purpose of configuring sources in YAML in dbt?
Configuring sources in YAML in dbt helps define where your raw data lives. It tells dbt which tables or files to use as inputs for your transformations.
Click to reveal answer
beginner
In a dbt source configuration YAML, what key is used to list the tables or files?
The key tables is used to list the tables or files under a source in the YAML configuration.
Click to reveal answer
intermediate
How do you specify the database and schema for a source in dbt YAML?
You specify the database and schema keys under the source name to tell dbt where to find the source data.
Click to reveal answer
beginner
What is the benefit of adding descriptions to sources and tables in YAML?
Adding descriptions helps document your data sources clearly. It makes it easier for anyone reading the project to understand what each source and table represents.
Click to reveal answer
beginner
Show a simple example of a source configuration in YAML for a source named 'raw_data' with one table 'users'.
Example:
sources:
  - name: raw_data
    database: analytics_db
    schema: public
    tables:
      - name: users
        description: 'User information table'
Click to reveal answer
Which key in a dbt YAML source config lists the tables?
Atables
Bcolumns
Csources
Dmodels
Where do you specify the schema for a source in dbt YAML?
AUnder the source name using the <code>schema</code> key
BInside each table definition
CIn the dbt_project.yml file
DIn the model SQL files
Why add descriptions to sources and tables in YAML?
ATo change table names
BTo speed up data loading
CTo define SQL queries
DTo improve documentation and clarity
What is the top-level key used to define sources in a dbt YAML file?
Amodels
Bsources
Ctables
Dschemas
In dbt, what does a source configuration NOT include?
ADatabase and schema location
BList of tables
CSQL transformation logic
DDescriptions
Explain how to configure a source in dbt using YAML. Include keys you would use and why.
Think about how you tell dbt where your raw data lives and what tables it includes.
You got /5 concepts.
    Describe the benefits of documenting sources and tables with descriptions in your YAML configuration.
    Why is it good to add notes about your data?
    You got /3 concepts.

      Practice

      (1/5)
      1. What is the main purpose of configuring sources in a dbt YAML file?
      easy
      A. To write SQL queries for data transformation
      B. To tell dbt where to find raw data tables
      C. To create dashboards for data visualization
      D. To schedule dbt runs automatically

      Solution

      1. Step 1: Understand the role of source configuration

        Source configuration in dbt YAML files defines where raw data tables are located in the database.
      2. Step 2: Differentiate from other dbt tasks

        Writing SQL queries and scheduling runs are done elsewhere, not in source YAML files.
      3. Final Answer:

        To tell dbt where to find raw data tables -> Option B
      4. Quick Check:

        Source config = raw data location [OK]
      Hint: Sources define raw table locations in YAML [OK]
      Common Mistakes:
      • Confusing source config with SQL model code
      • Thinking sources schedule runs
      • Assuming sources create visualizations
      2. Which of the following is the correct syntax to define a source in a dbt YAML file?
      easy
      A. source: name: raw_data table: - customers
      B. sources: name: raw_data tables: - customers
      C. sources: - name: raw_data tables: - name: customers
      D. source: - raw_data: tables: - customers

      Solution

      1. Step 1: Recall correct YAML source structure

        The correct syntax uses 'sources' as a list with 'name' and nested 'tables' list, each with a 'name'.
      2. Step 2: Compare options to syntax

        sources: - name: raw_data tables: - name: customers matches the correct indentation and keys exactly.
      3. Final Answer:

        sources: - name: raw_data tables: - name: customers -> Option C
      4. Quick Check:

        Correct YAML keys and indentation = sources: - name: raw_data tables: - name: customers [OK]
      Hint: Look for 'sources' list with 'name' and 'tables' keys [OK]
      Common Mistakes:
      • Using singular 'source' instead of 'sources'
      • Missing 'name' key for tables
      • Incorrect indentation breaking YAML structure
      3. Given this YAML snippet, what is the value of the 'loaded_at_field' for the source 'sales_data'?
      sources:
        - name: sales_data
          tables:
            - name: transactions
              loaded_at_field: transaction_date
      medium
      A. transaction_date
      B. transactions
      C. loaded_at_field
      D. sales_data

      Solution

      1. Step 1: Locate the 'loaded_at_field' key in YAML

        It is nested under the 'transactions' table inside the 'sales_data' source.
      2. Step 2: Identify the value assigned

        The value assigned to 'loaded_at_field' is 'transaction_date'.
      3. Final Answer:

        transaction_date -> Option A
      4. Quick Check:

        loaded_at_field value = transaction_date [OK]
      Hint: Find 'loaded_at_field' key's value under table [OK]
      Common Mistakes:
      • Confusing source name with field value
      • Picking table name instead of field value
      • Misreading YAML indentation levels
      4. Identify the error in this source configuration YAML:
      sources:
        - name: marketing_data
          tables:
            - name: leads
              freshness:
                warn_after:
                  count: 12
                  period: hours
                error_after:
                  count: 1
                  period: days
      medium
      A. 'warn_after' and 'error_after' counts are reversed
      B. The indentation under 'freshness' is incorrect
      C. The 'error_after' period should be less than 'warn_after'
      D. The 'period' values must be singular strings

      Solution

      1. Step 1: Understand dbt freshness period syntax

        dbt freshness requires singular 'period' values like 'hour', 'day', 'minute'. Plural forms ('hours', 'days') are invalid and cause errors.
      2. Step 2: Check the YAML periods

        'period: hours' and 'period: days' use plural, which dbt does not recognize.
      3. Step 3: Rule out other options

        A: Counts logical (12 hours warn before 1 day/24 hours error). B: Indentation correct. C: Incorrect--error_after time must be *longer* than warn_after.
      4. Final Answer:

        The 'period' values must be singular strings -> Option D
      5. Quick Check:

        period: hour/day (singular only) [OK]
      Hint: dbt freshness periods must be singular (hour, day) [OK]
      Common Mistakes:
      • Using plural periods ('hours', 'days')
      • Incorrect YAML indentation
      • Thinking error_after time should be shorter than warn_after
      5. You want to add a test to ensure the 'email' column in the 'users' table source is never null. Which YAML snippet correctly adds this test?
      hard
      A. sources: - name: app_data tables: - name: users columns: - name: email tests: - not_null
      B. sources: - name: app_data tables: - name: users tests: - column: email test: not_null
      C. sources: - name: app_data tables: - users: columns: - email: tests: - not_null
      D. sources: - name: app_data tables: - name: users columns: - email test: not_null

      Solution

      1. Step 1: Recall correct test syntax in source YAML

        Tests are added under 'columns' with 'name' and a 'tests' list containing test names.
      2. Step 2: Check each option's structure

        sources: - name: app_data tables: - name: users columns: - name: email tests: - not_null correctly uses 'columns' list with 'name' and 'tests' list containing 'not_null'.
      3. Step 3: Identify errors in other options

        sources: - name: app_data tables: - name: users tests: - column: email test: not_null uses wrong keys, sources: - name: app_data tables: - users: columns: - email: tests: - not_null has wrong nesting, sources: - name: app_data tables: - name: users columns: - email test: not_null uses 'test' instead of 'tests'.
      4. Final Answer:

        sources: - name: app_data tables: - name: users columns: - name: email tests: - not_null -> Option A
      5. Quick Check:

        Tests under columns with 'tests' list = sources: - name: app_data tables: - name: users columns: - name: email tests: - not_null [OK]
      Hint: Tests go under columns with 'tests' list [OK]
      Common Mistakes:
      • Using 'test' instead of 'tests'
      • Wrong nesting of columns and tests
      • Misnaming keys like 'column' instead of 'name'