Why Jinja makes SQL dynamic in dbt - Performance Analysis
We want to understand how using Jinja to make SQL dynamic affects the time it takes to run queries.
Specifically, how does adding dynamic parts change the work done as input grows?
Analyze the time complexity of the following dbt model using Jinja.
{{ config(materialized='table') }}
select *
from {{ source('sales', 'orders') }}
{% if var('filter_date') %}
where order_date >= '{{ var('filter_date') }}'
{% endif %}
This code builds a SQL query that optionally filters orders by date if a variable is set.
Look at what repeats or loops in this code.
- Primary operation: The database scans rows from the orders table.
- How many times: Once per query execution, scanning all or filtered rows.
The database work grows with the number of rows in the orders table.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | Scan 10 rows or fewer if filtered |
| 100 | Scan 100 rows or fewer if filtered |
| 1000 | Scan 1000 rows or fewer if filtered |
Pattern observation: The work grows roughly in direct proportion to the number of rows scanned.
Time Complexity: O(n)
This means the query time grows linearly with the number of rows processed.
[X] Wrong: "Adding Jinja makes the query run slower because it adds loops in the code."
[OK] Correct: Jinja runs once before the query runs, generating SQL. The database does the actual row processing, so Jinja itself does not add repeated work as data grows.
Understanding how templating affects query performance shows you can separate code generation from data processing, a key skill in data engineering and analytics.
"What if we added a loop in Jinja to generate multiple similar queries? How would that affect the overall time complexity?"