is_incremental() macro in dbt - Time & Space Complexity
We want to understand how the time needed to run a dbt model changes when using the is_incremental() macro.
This helps us see how the model behaves when it runs fully or just updates new data.
Analyze the time complexity of this dbt model snippet using is_incremental():
{{ config(materialized='incremental') }}
select * from source_table
{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}
This code selects all rows on first run, then only new or updated rows on incremental runs.
Look at what repeats when the model runs:
- Primary operation: Scanning rows in
source_table. - How many times: Once fully on first run; only new rows on incremental runs.
When running fully, the model reads all rows, so time grows with total rows.
| Input Size (rows) | Approx. Operations |
|---|---|
| 10 | 10 rows scanned |
| 100 | 100 rows scanned |
| 1000 | 1000 rows scanned |
On incremental runs, only new rows are scanned, so time grows with new data size, not total data.
Time Complexity: O(n)
This means the time grows linearly with the number of rows processed each run.
[X] Wrong: "The is_incremental() macro makes the model always run faster regardless of data size."
[OK] Correct: The macro only limits rows processed to new data, so if many new rows appear, the run can still take a long time.
Understanding how incremental logic affects runtime helps you design efficient data pipelines and shows you think about scaling data workflows.
"What if the filter inside is_incremental() used a different column that is not indexed? How would the time complexity change?"