Full refresh vs incremental in dbt - Performance Comparison
When using dbt, we often choose between full refresh and incremental runs.
We want to know how the time to run changes as data grows.
Analyze the time complexity of these dbt model configurations.
-- Full refresh
{{ config(materialized='table', full_refresh=true) }}
select * from source_table
-- Incremental
{{ config(materialized='incremental') }}
select * from source_table
where updated_at > (select max(updated_at) from {{ this }})
The first rebuilds the whole table every time. The second adds only new or changed rows.
Look at what repeats when running these models.
- Primary operation: Reading and processing rows from source_table.
- How many times: Full refresh reads all rows every run; incremental reads only new or updated rows.
Think about how the number of rows affects run time.
| Input Size (rows) | Full Refresh Operations | Incremental Operations |
|---|---|---|
| 10,000 | Processes all 10,000 rows | Processes only new rows (e.g., 100) |
| 100,000 | Processes all 100,000 rows | Processes only new rows (e.g., 1,000) |
| 1,000,000 | Processes all 1,000,000 rows | Processes only new rows (e.g., 10,000) |
Full refresh time grows directly with total data size. Incremental time grows with new data size, usually much smaller.
Time Complexity: O(n) for full refresh, O(k) for incremental where k << n
This means full refresh work grows with all data, incremental work grows only with new data added.
[X] Wrong: "Incremental runs always take the same time as full refresh."
[OK] Correct: Incremental only processes new or changed rows, so it usually runs faster as data grows.
Understanding how data size affects dbt runs helps you design efficient data pipelines and explain your choices clearly.
What if the incremental filter is missing or incorrect? How would that affect the time complexity?