0
0
dbtdata~5 mins

Full refresh vs incremental in dbt

Choose your learning style9 modes available
Introduction

We use full refresh and incremental methods to update data efficiently. Full refresh reloads everything, while incremental updates only add new or changed data.

When you want to reload all data from scratch to ensure accuracy.
When your data source is small or changes completely often.
When you want to save time by only adding new data instead of reloading all.
When your data grows large and full reloads take too long.
When you want to keep your data warehouse up to date with recent changes.
Syntax
dbt
models:
  my_model:
    materialized: incremental
    incremental_strategy: insert_overwrite
    unique_key: id

-- SQL inside the model
{{ config(materialized='incremental', unique_key='id') }}

SELECT * FROM source_table
{% if is_incremental() %}
  WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}

materialized: defines how dbt builds the model (table, view, incremental).

is_incremental(): lets you write SQL that runs only during incremental runs.

Examples
This is a full refresh example. The whole table is rebuilt every time.
dbt
{{ config(materialized='table') }}

SELECT * FROM source_table
This is an incremental model. It adds only new or updated rows based on updated_at.
dbt
{{ config(materialized='incremental', unique_key='id') }}

SELECT * FROM source_table
{% if is_incremental() %}
  WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}
Sample Program

This dbt model uses incremental materialization. On first run, it loads all data. On later runs, it adds only rows with newer updated_at values.

dbt
-- dbt model: incremental_example.sql

{{ config(materialized='incremental', unique_key='id') }}

SELECT id, name, updated_at FROM source_table

{% if is_incremental() %}
  WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}
OutputSuccess
Important Notes

Full refresh rebuilds the entire table and can be slow for large data.

Incremental models need a unique key to avoid duplicates.

Use is_incremental() to write SQL that runs only during incremental updates.

Summary

Full refresh reloads all data every time.

Incremental updates add only new or changed data.

Incremental saves time and resources for large datasets.