Bird
Raised Fist0
dbtdata~5 mins

source() function for raw tables in dbt

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction

The source() function helps you tell dbt where your raw data tables are. It connects your project to the original data so you can use it safely and clearly.

When you want to use raw data tables from your database in your dbt models.
When you want to document and track the source of your data for clarity and auditing.
When you want to build transformations on top of raw tables without hardcoding table names.
When you want to make your dbt project easier to maintain by referencing sources consistently.
Syntax
dbt
source('source_name', 'table_name')

source_name is the name you gave to your data source in your sources.yml file.

table_name is the raw table name inside that source.

Examples
This selects all data from the customers table inside the raw_data source.
dbt
select * from {{ source('raw_data', 'customers') }}
This gets the id and name columns from the orders table in the sales_db source, filtering for completed orders.
dbt
select id, name from {{ source('sales_db', 'orders') }} where status = 'completed'
Sample Program

This example shows how to define a source called raw_data with a table customers in the sources.yml file. Then, in a dbt model, it selects the id and email of active customers from that source.

dbt
/* In your sources.yml file */
version: 2

sources:
  - name: raw_data
    schema: raw_data
    tables:
      - name: customers

/* In your dbt model SQL file */
select id, email from {{ source('raw_data', 'customers') }} where active = true
OutputSuccess
Important Notes

Always define your sources in sources.yml before using source() in your models.

Using source() helps dbt track data lineage and improves documentation.

Summary

source() connects your dbt models to raw tables safely.

It uses names from your sources.yml file for clarity and maintenance.

It helps you write cleaner, more understandable SQL in dbt projects.

Practice

(1/5)
1. What is the main purpose of the source() function in dbt?
easy
A. To create new tables in the database
B. To run Python scripts inside dbt models
C. To delete raw tables from the database
D. To reference raw tables defined in the sources.yml file

Solution

  1. Step 1: Understand the role of source()

    The source() function is used to safely reference raw tables that are defined in the sources.yml file.
  2. Step 2: Differentiate from other dbt functions

    It does not create or delete tables, nor run scripts. It only connects models to existing raw data tables.
  3. Final Answer:

    To reference raw tables defined in the sources.yml file -> Option D
  4. Quick Check:

    source() connects to raw tables [OK]
Hint: Remember: source() links to raw tables only [OK]
Common Mistakes:
  • Thinking source() creates tables
  • Confusing source() with model creation
  • Assuming source() runs scripts
2. Which of the following is the correct syntax to reference a raw table named customers in the source named raw_data using source() in a dbt model?
easy
A. select * from source.raw_data.customers
B. select * from {{ source('raw_data', 'customers') }}
C. select * from source['raw_data']['customers']
D. select * from source(raw_data, customers)

Solution

  1. Step 1: Recall source() function syntax

    The correct syntax uses two string arguments: the source name and the table name, both in quotes, wrapped in {{ }}.
  2. Step 2: Check each option

    The valid syntax is select * from {{ source('raw_data', 'customers') }}. Dot notation, unquoted arguments, and bracket notation are all invalid in dbt.
  3. Final Answer:

    select * from {{ source('raw_data', 'customers') }} -> Option B
  4. Quick Check:

    {{ source() }} with quoted arguments [OK]
Hint: Always use quotes around source and table names [OK]
Common Mistakes:
  • Omitting quotes around source or table names
  • Using dot or bracket notation instead of function call
  • Passing variables without quotes
3. Given the following dbt model SQL code:
select id, name from {{ source('sales_db', 'customers') }} where active = true

What does this query do?
medium
A. Selects all customers from the customers table in the sales_db source where active is true
B. Creates a new table named customers in sales_db
C. Deletes inactive customers from the customers table
D. Selects all customers from a model named sales_db

Solution

  1. Step 1: Understand the source() usage

    The code references the raw table customers inside the source sales_db.
  2. Step 2: Analyze the SQL query

    The query selects id and name columns where active is true, filtering active customers.
  3. Final Answer:

    Selects all customers from the customers table in the sales_db source where active is true -> Option A
  4. Quick Check:

    source() reads raw tables, query filters active customers [OK]
Hint: Look for source() to identify raw table references [OK]
Common Mistakes:
  • Thinking source() creates or deletes tables
  • Confusing source with models
  • Ignoring the WHERE clause filtering
4. You wrote this dbt model code:
select * from source('marketing', 'leads')

but dbt throws an error: Compilation Error: 'source' is undefined. What is the most likely cause?
medium
A. You used single quotes instead of double quotes inside source()
B. The table leads does not exist in the database
C. You forgot to wrap source() in double curly braces {{ }}
D. The marketing source is not defined in sources.yml

Solution

  1. Step 1: Check dbt Jinja syntax

    dbt requires Jinja functions like source() to be inside double curly braces: {{ source(...) }}.
  2. Step 2: Understand the error message

    The error says source is undefined, meaning dbt treats it as plain SQL, not a Jinja function.
  3. Final Answer:

    You forgot to wrap source() in double curly braces {{ }} -> Option C
  4. Quick Check:

    Use {{ source() }} to call source function [OK]
Hint: Always use {{ }} around dbt functions like source() [OK]
Common Mistakes:
  • Writing source() without {{ }}
  • Assuming quotes type causes error
  • Ignoring missing source definition errors
5. You want to create a dbt model that selects only customers from the raw_data source's customers table who joined after 2023-01-01. Which of the following is the correct way to write this using source()?
hard
A. select * from {{ source('raw_data', 'customers') }} where join_date > '2023-01-01'
B. select * from source('raw_data', 'customers') where join_date > '2023-01-01'
C. select * from {{ source('raw_data', customers) }} where join_date > '2023-01-01'
D. select * from {{ source('raw_data', 'customers') }} where join_date > 2023-01-01

Solution

  1. Step 1: Use correct source() syntax with Jinja braces

    The source() function must be inside {{ }} and both source and table names must be strings in quotes.
  2. Step 2: Use correct date format in SQL condition

    The date value must be a string in quotes to compare properly in SQL.
  3. Final Answer:

    select * from {{ source('raw_data', 'customers') }} where join_date > '2023-01-01' -> Option A
  4. Quick Check:

    Correct syntax and date string format [OK]
Hint: Wrap source() in {{ }}, use quotes for strings and dates [OK]
Common Mistakes:
  • Missing {{ }} around source()
  • Not quoting table name or date string
  • Using unquoted date causing SQL error