How dbt works (SQL + Jinja + YAML) - Performance & Efficiency
Start learning this pattern below
Jump into concepts and practice - no test required
We want to understand how the time dbt takes to run grows as the size of data or number of models increases.
Specifically, how does dbt's combination of SQL, Jinja, and YAML affect execution time?
Analyze the time complexity of this dbt model snippet.
-- models/example_model.sql
{{ config(materialized='table') }}
select
user_id,
count(*) as total_orders
from {{ ref('orders') }}
where order_date >= '{{ var("start_date") }}'
group by user_id
This code runs a SQL query with Jinja templating and YAML config to build a table from a referenced model.
Look at what repeats as input grows.
- Primary operation: Scanning and grouping rows in the referenced table.
- How many times: Once per run, but the scan touches every row matching the date filter.
As the number of rows in the orders table grows, the query scans more data.
| Input Size (n rows) | Approx. Operations |
|---|---|
| 10 | 10 rows scanned and grouped |
| 100 | 100 rows scanned and grouped |
| 1000 | 1000 rows scanned and grouped |
Pattern observation: The work grows roughly in direct proportion to the number of rows scanned.
Time Complexity: O(n)
This means the time grows linearly with the number of rows processed in the SQL query.
[X] Wrong: "dbt runs all models instantly regardless of data size because it just runs SQL."
[OK] Correct: The SQL query inside dbt still processes data, so bigger tables mean more work and longer run times.
Understanding how dbt runs SQL with templating helps you explain data pipeline performance clearly and confidently.
"What if the model used a more complex join instead of a simple filter? How would the time complexity change?"
Practice
Solution
Step 1: Understand Jinja's purpose in dbt
Jinja is a templating language that allows adding logic like loops and conditions inside SQL files.Step 2: Differentiate roles of SQL, Jinja, and YAML
SQL writes queries, YAML manages configs/docs, and Jinja adds dynamic logic to SQL.Final Answer:
To add logic and dynamic behavior to SQL queries -> Option AQuick Check:
Jinja = logic in SQL [OK]
- Confusing Jinja with YAML for configs
- Thinking Jinja executes SQL queries
- Assuming Jinja writes raw SQL without changes
Solution
Step 1: Recall Jinja syntax for variables
Jinja variables are inserted using double curly braces {{ }} around expressions.Step 2: Identify correct syntax for var function
The correct syntax is {{ var('variable_name') }} to access a variable in dbt.Final Answer:
SELECT * FROM {{ var('table_name') }} -> Option BQuick Check:
Jinja variables use {{ }} [OK]
- Using single curly braces or wrong brackets
- Confusing Jinja tags {% %} with variable insertion {{ }}
- Using square brackets instead of curly braces
SELECT
user_id,
{% if var('include_email', false) %}
email,
{% endif %}
created_at
FROM usersAssuming the variable
include_email is set to true in dbt_project.yml.Solution
Step 1: Check the value of the variable include_email
The variable include_email is true, so the if condition passes and the email column is included.Step 2: Render the SQL with the if block included
The SQL will have user_id, email, and created_at columns selected from users.Final Answer:
SELECT user_id, email, created_at FROM users -> Option DQuick Check:
include_email true means email included [OK]
- Ignoring the variable value and excluding email
- Thinking Jinja syntax causes SQL errors
- Confusing variable default values
models:
my_project:
+materialized: table
users:
+tags: ['important']Why does dbt raise an error when running?
Solution
Step 1: Check YAML indentation rules for dbt configs
In dbt, model configs under a project must be indented properly; 'users' should be at the same level as '+materialized'.Step 2: Identify the indentation error
'users' is indented too far, making it a child of '+materialized' which is invalid.Final Answer:
Because the indentation for 'users' is incorrect under 'my_project' -> Option AQuick Check:
YAML indentation matters for nested configs [OK]
- Ignoring YAML indentation importance
- Thinking '+materialized' is invalid syntax
- Assuming tags cannot be lists
Solution
Step 1: Store the filter value in YAML as a variable
Define 'active_flag: true' in YAML under vars or config to make it accessible.Step 2: Use Jinja to insert the variable in SQL WHERE clause
UseWHERE active = {{ var('active_flag') }}so the SQL filters active users dynamically.Final Answer:
Define 'active_flag: true' in YAML, then use WHERE active = {{ var('active_flag') }} in SQL with Jinja -> Option CQuick Check:
YAML vars + Jinja in SQL = dynamic filters [OK]
- Hardcoding filter in SQL ignoring YAML
- Not using Jinja to insert YAML vars
- Trying to read YAML directly in SQL without var()
