Overview - Incremental strategies (append, merge, delete+insert)

What is it?

Incremental strategies in dbt are ways to update data models efficiently by only processing new or changed data instead of rebuilding everything from scratch. The three main strategies are append, merge, and delete+insert. Append adds new rows, merge updates existing rows and adds new ones, and delete+insert replaces data by deleting old rows and inserting fresh ones. These methods help save time and computing resources when working with large datasets.

Why it matters

Without incremental strategies, every data update would require rebuilding entire tables, which can be slow and costly. Incremental updates let data teams refresh only what changed, making pipelines faster and more efficient. This means quicker insights and less strain on data warehouses, which is crucial for businesses relying on timely data decisions.

Where it fits

Learners should first understand basic SQL and dbt models before tackling incremental strategies. After mastering these, they can explore advanced data engineering topics like data freshness, partitioning, and orchestration. Incremental strategies are a key step toward building scalable, maintainable data pipelines.

Mental Model

Core Idea

Incremental strategies update data by processing only new or changed parts instead of rebuilding everything.

Think of it like...

It's like updating a photo album by adding new pictures, fixing some existing ones, or replacing old pages, instead of printing a whole new album every time.

┌───────────────┐
│ Full Refresh  │
│ (Rebuild all) │
└──────┬────────┘
       │
       ▼
┌─────────────────────────────┐
│ Incremental Strategies       │
│ ┌───────────┐ ┌───────────┐ │
│ │ Append    │ │ Merge     │ │
│ │ (Add new) │ │ (Add+Upd) │ │
│ └───────────┘ └───────────┘ │
│          ┌───────────────┐  │
│          │ Delete+Insert │  │
│          │ (Replace data)│  │
│          └───────────────┘  │
└─────────────────────────────┘

Build-Up - 7 Steps

1

FoundationWhat is Incremental Loading

Concept: Introduce the basic idea of incremental loading as updating data by adding only new information.

Imagine you have a list of sales every day. Instead of rewriting the whole list every time, you just add the new day's sales to the end. This is incremental loading — adding new data without touching old data.

Result

You save time and resources by not reprocessing old data.

Understanding incremental loading helps you see why full rebuilds are often unnecessary and inefficient.

2

Foundationdbt Incremental Model Basics

3

IntermediateAppend Strategy Explained

4

IntermediateMerge Strategy Explained

5

IntermediateDelete+Insert Strategy Explained

6

AdvancedChoosing the Right Incremental Strategy

7

ExpertHandling Complex Incremental Edge Cases

Under the Hood

dbt incremental models run SQL that checks if the table exists and if the run is incremental. For append, it inserts filtered new rows. For merge, it uses SQL MERGE or equivalent commands to update and insert rows based on keys. For delete+insert, it deletes rows matching criteria then inserts fresh data. The warehouse executes these commands efficiently using indexes and partitions.

Why designed this way?

Incremental strategies were designed to reduce resource use and speed up data pipelines by avoiding full table rebuilds. Early data tools rebuilt everything, which was slow and costly. Incremental methods balance freshness and efficiency, adapting to different data change patterns.

┌───────────────────────────────┐
│ dbt Incremental Model Process  │
├───────────────┬───────────────┤
│ Check if table exists          │
│ and if run is incremental     │
├───────────────┴───────────────┤
│ If full refresh:               │
│   Rebuild entire table        │
│ Else if incremental:          │
│   ┌───────────────┐           │
│   │ Append        │           │
│   │ Insert new rows│           │
│   ├───────────────┤           │
│   │ Merge         │           │
│   │ Update+Insert │           │
│   ├───────────────┤           │
│   │ Delete+Insert │           │
│   │ Delete old +  │           │
│   │ Insert new    │           │
│   └───────────────┘           │
└───────────────────────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does append strategy update existing rows or only add new ones? Commit to yes or no.

Common Belief:Append strategy can update existing rows if data changes.

Tap to reveal reality

Quick: Does merge strategy always guarantee no duplicates? Commit to yes or no.

Common Belief:Merge strategy automatically prevents all duplicates.

Tap to reveal reality

Quick: Is delete+insert the same as a full table rebuild? Commit to yes or no.

Common Belief:Delete+insert is just a full rebuild under a different name.

Tap to reveal reality

Quick: Do incremental models always run faster than full refreshes? Commit to yes or no.

Common Belief:Incremental models are always faster than full refreshes.

Tap to reveal reality

Expert Zone

1

Incremental strategies depend heavily on the uniqueness and stability of keys; changing keys can break merges.

2

Warehouse-specific SQL features like MERGE syntax or partition pruning affect how incremental strategies perform and must be understood deeply.

3

Combining incremental strategies with snapshotting or change data capture can create powerful, reliable pipelines but requires careful orchestration.

When NOT to use

Incremental strategies are not suitable when data changes affect large portions of the table or when data lineage requires full rebuilds for accuracy. In such cases, full refresh or snapshot models are better alternatives.

Production Patterns

In production, teams often use append for log data, merge for slowly changing dimensions, and delete+insert for partitioned fact tables with late-arriving data. They also automate incremental runs with freshness checks and alerting to catch failures early.

Connections

Change Data Capture (CDC)

Incremental strategies build on CDC by applying changes efficiently to data models.

Understanding CDC helps grasp how incremental updates track and apply only data changes, improving pipeline efficiency.

Database Indexing

Incremental merges rely on indexes to quickly find and update rows.

Knowing indexing principles explains why merge operations can be fast or slow depending on key design.

Version Control Systems (e.g., Git)

Incremental updates in data are similar to commits in version control, where only changes are saved.

Seeing incremental data updates like code commits clarifies the value of tracking and applying only differences.

Common Pitfalls

#1Using append strategy on data that changes over time.

Wrong approach:if is_incremental(): select * from source where date > (select max(date) from target)

Correct approach:if is_incremental(): merge into target using source on key when matched then update when not matched then insert

Root cause:Misunderstanding that append only adds new rows and cannot update existing data.

#2Not defining a unique key for merge strategy.

Wrong approach:merge into target using source on source.date = target.date when matched then update set ...

Correct approach:merge into target using source on source.primary_key = target.primary_key when matched then update set ...

Root cause:Assuming any column can serve as a unique key without ensuring uniqueness.

#3Deleting entire table instead of partition for delete+insert.

Wrong approach:delete from target; insert into target select * from source where date = '2024-01-01'

Correct approach:delete from target where date = '2024-01-01'; insert into target select * from source where date = '2024-01-01'

Root cause:Confusing full table refresh with partial partition refresh.

Key Takeaways

Incremental strategies update data efficiently by processing only new or changed data, saving time and resources.

Append adds new rows only, merge updates existing rows and adds new ones, and delete+insert replaces specific data subsets.

Choosing the right strategy depends on data characteristics like mutability and volume.

Understanding warehouse SQL features and data keys is crucial for reliable incremental models.

Incremental updates are powerful but require careful design to avoid data quality issues.