Why incremental models save time and cost in dbt - Performance Analysis
Incremental models in dbt help us process only new or changed data instead of the entire dataset every time.
We want to know how this approach affects the time it takes to run and the cost involved.
Analyze the time complexity of this incremental model code snippet.
{{ config(
materialized='incremental',
unique_key='id'
) }}
select * from source_table
{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}
This code selects all rows from the source table on the first run, then only new or updated rows on later runs.
- Primary operation: Scanning rows from the source table.
- How many times: Once for all rows on first run; only for new or updated rows on later runs.
When the model runs the first time, it processes all rows, so time grows with total data size.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 rows scanned |
| 100 | 100 rows scanned |
| 1000 | 1000 rows scanned |
On later runs, only new or changed rows are scanned, so operations grow with the number of changes, not total data.
Time Complexity: O(k)
This means the time depends on the number of new or updated rows (k), not the total data size.
[X] Wrong: "Incremental models always scan the entire dataset every time."
[OK] Correct: Incremental models only scan new or changed data after the first run, saving time and cost.
Understanding incremental models shows you can handle large data efficiently, a key skill in real projects.
"What if the incremental model did not have a unique key? How would that affect the time complexity?"