dbtdata~5 mins

Full refresh vs incremental in dbt - Performance Comparison

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Time Complexity: Full refresh vs incremental

O(n)

Understanding Time Complexity

When using dbt, we often choose between full refresh and incremental runs.

We want to know how the time to run changes as data grows.

Scenario Under Consideration

Analyze the time complexity of these dbt model configurations.

-- Full refresh
{{ config(materialized='table', full_refresh=true) }}

select * from source_table

-- Incremental
{{ config(materialized='incremental') }}

select * from source_table
where updated_at > (select max(updated_at) from {{ this }})

The first rebuilds the whole table every time. The second adds only new or changed rows.

Identify Repeating Operations

Look at what repeats when running these models.

Primary operation: Reading and processing rows from source_table.
How many times: Full refresh reads all rows every run; incremental reads only new or updated rows.

How Execution Grows With Input

Think about how the number of rows affects run time.

Input Size (rows)	Full Refresh Operations	Incremental Operations
10,000	Processes all 10,000 rows	Processes only new rows (e.g., 100)
100,000	Processes all 100,000 rows	Processes only new rows (e.g., 1,000)
1,000,000	Processes all 1,000,000 rows	Processes only new rows (e.g., 10,000)

Full refresh time grows directly with total data size. Incremental time grows with new data size, usually much smaller.

Final Time Complexity

Time Complexity: O(n) for full refresh, O(k) for incremental where k << n

This means full refresh work grows with all data, incremental work grows only with new data added.

Common Mistake

[X] Wrong: "Incremental runs always take the same time as full refresh."

[OK] Correct: Incremental only processes new or changed rows, so it usually runs faster as data grows.

Interview Connect

Understanding how data size affects dbt runs helps you design efficient data pipelines and explain your choices clearly.

Self-Check

What if the incremental filter is missing or incorrect? How would that affect the time complexity?