source() function for raw tables in dbt - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
We want to understand how the time to run a dbt model using the source() function changes as the size of the raw table grows.
Specifically, how does the data volume affect the work dbt does when reading from a raw table?
Analyze the time complexity of the following dbt code snippet.
select *
from {{ source('raw_schema', 'raw_table') }}
where event_date >= '2024-01-01'
This code reads data from a raw table using source() and filters rows by date.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning rows in the raw table to apply the date filter.
- How many times: Once for each row in the raw table.
As the number of rows in the raw table grows, the number of rows scanned grows at the same rate.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 row checks |
| 100 | About 100 row checks |
| 1000 | About 1000 row checks |
Pattern observation: The work grows directly with the number of rows; doubling rows doubles work.
Time Complexity: O(n)
This means the time to run grows linearly with the number of rows in the raw table.
[X] Wrong: "Using source() is instant and does not depend on table size."
[OK] Correct: The source() function points to the raw table, so the database must scan rows to apply filters, which takes longer as the table grows.
Understanding how data size affects query time is key for writing efficient dbt models and working with raw data sources.
What if we added an index on event_date? How would the time complexity change?
Practice
source() function in dbt?Solution
Step 1: Understand the role of
Thesource()source()function is used to safely reference raw tables that are defined in thesources.ymlfile.Step 2: Differentiate from other dbt functions
It does not create or delete tables, nor run scripts. It only connects models to existing raw data tables.Final Answer:
To reference raw tables defined in thesources.ymlfile -> Option DQuick Check:
source()connects to raw tables [OK]
source() links to raw tables only [OK]- Thinking
source()creates tables - Confusing
source()with model creation - Assuming
source()runs scripts
customers in the source named raw_data using source() in a dbt model?Solution
Step 1: Recall
The correct syntax uses two string arguments: the source name and the table name, both in quotes, wrapped insource()function syntax{{ }}.Step 2: Check each option
The valid syntax isselect * from {{ source('raw_data', 'customers') }}. Dot notation, unquoted arguments, and bracket notation are all invalid in dbt.Final Answer:
select * from {{ source('raw_data', 'customers') }} -> Option BQuick Check:
{{ source() }}with quoted arguments [OK]
- Omitting quotes around source or table names
- Using dot or bracket notation instead of function call
- Passing variables without quotes
select id, name from {{ source('sales_db', 'customers') }} where active = trueWhat does this query do?
Solution
Step 1: Understand the
The code references the raw tablesource()usagecustomersinside the sourcesales_db.Step 2: Analyze the SQL query
The query selectsidandnamecolumns whereactiveis true, filtering active customers.Final Answer:
Selects all customers from thecustomerstable in thesales_dbsource whereactiveis true -> Option AQuick Check:
source()reads raw tables, query filters active customers [OK]
source() to identify raw table references [OK]- Thinking
source()creates or deletes tables - Confusing source with models
- Ignoring the WHERE clause filtering
select * from source('marketing', 'leads')but dbt throws an error:
Compilation Error: 'source' is undefined. What is the most likely cause?Solution
Step 1: Check dbt Jinja syntax
dbt requires Jinja functions likesource()to be inside double curly braces:{{ source(...) }}.Step 2: Understand the error message
The error sayssourceis undefined, meaning dbt treats it as plain SQL, not a Jinja function.Final Answer:
You forgot to wrapsource()in double curly braces {{ }} -> Option CQuick Check:
Use{{ source() }}to call source function [OK]
- Writing source() without {{ }}
- Assuming quotes type causes error
- Ignoring missing source definition errors
raw_data source's customers table who joined after 2023-01-01. Which of the following is the correct way to write this using source()?Solution
Step 1: Use correct
Thesource()syntax with Jinja bracessource()function must be inside{{ }}and both source and table names must be strings in quotes.Step 2: Use correct date format in SQL condition
The date value must be a string in quotes to compare properly in SQL.Final Answer:
select * from {{ source('raw_data', 'customers') }} where join_date > '2023-01-01' -> Option AQuick Check:
Correct syntax and date string format [OK]
- Missing {{ }} around source()
- Not quoting table name or date string
- Using unquoted date causing SQL error
