0
0
dbtdata~15 mins

is_incremental() macro in dbt - Deep Dive

Choose your learning style9 modes available
Overview - is_incremental() macro
What is it?
The is_incremental() macro in dbt is a special function that helps you know if your current model run is an incremental update or a full refresh. Incremental updates add only new or changed data instead of rebuilding everything from scratch. This macro returns true during incremental runs and false otherwise, letting you write conditional logic in your SQL models.
Why it matters
Without the is_incremental() macro, every time you run your data model, you would have to rebuild the entire dataset, which can be slow and costly. Incremental processing saves time and resources by updating only what changed. This macro makes it easy to write models that adapt their behavior automatically, improving efficiency and scalability in data pipelines.
Where it fits
Before learning is_incremental(), you should understand basic dbt models and how dbt runs SQL transformations. After mastering this macro, you can explore advanced incremental model patterns, performance optimization, and handling slowly changing dimensions.
Mental Model
Core Idea
is_incremental() tells your model if it is running as an incremental update so you can write SQL that only processes new or changed data.
Think of it like...
Imagine you are updating a photo album. Instead of reprinting every photo each time, you only add the new pictures taken since the last update. The is_incremental() macro is like checking if you are adding new photos or starting fresh.
┌─────────────────────────────┐
│ dbt model run starts        │
├─────────────────────────────┤
│ is_incremental() returns:   │
│   ├─ True  → run incremental│
│   └─ False → full refresh   │
└─────────────────────────────┘
Build-Up - 6 Steps
1
FoundationWhat is dbt and models
🤔
Concept: Introduce dbt and the idea of models as SQL files that transform data.
dbt (data build tool) lets you write SQL queries called models to transform raw data into clean, usable tables. Each model runs as a SQL query that creates or replaces a table or view in your database.
Result
You understand that dbt runs SQL models to build tables.
Knowing what a dbt model is helps you see where is_incremental() fits in the process of building data.
2
FoundationFull refresh vs incremental runs
🤔
Concept: Explain the difference between rebuilding all data and updating only new data.
A full refresh rebuilds the entire table every time, which can be slow for large datasets. Incremental runs add or update only the new or changed rows, making the process faster and cheaper.
Result
You can distinguish between full and incremental runs in dbt.
Understanding these two run types is key to why is_incremental() exists.
3
IntermediateUsing is_incremental() in SQL models
🤔Before reading on: do you think is_incremental() returns true during full refreshes or incremental runs? Commit to your answer.
Concept: Learn how to use is_incremental() to write conditional SQL that runs only during incremental updates.
Inside your model SQL, you can write: {% raw %} {% if is_incremental() %} -- SQL for incremental update {% else %} -- SQL for full refresh {% endif %} {% endraw %} This lets you control which rows to insert or update only when running incrementally.
Result
Your model runs different SQL depending on the run type.
Knowing how to branch logic with is_incremental() lets you optimize data processing.
4
IntermediateWriting incremental filters with is_incremental()
🤔Before reading on: do you think you should filter new rows inside or outside the is_incremental() block? Commit to your answer.
Concept: Use is_incremental() to filter only new or changed rows during incremental runs.
Example: {% raw %} select * from source_table {% if is_incremental() %} where updated_at > (select max(updated_at) from {{ this }}) {% endif %} {% endraw %} This SQL selects all rows on full refresh, but only new rows on incremental runs.
Result
Incremental runs process only new or updated data.
Filtering inside is_incremental() prevents reprocessing unchanged data, improving efficiency.
5
AdvancedHandling schema changes with is_incremental()
🤔Before reading on: do you think is_incremental() handles schema changes automatically? Commit to your answer.
Concept: Understand that incremental runs assume stable schema; schema changes require full refresh or special handling.
If your source schema changes (new columns, types), incremental runs may fail or produce incorrect results. You can use is_incremental() to detect incremental runs and force a full refresh or add migration logic.
Result
You avoid errors caused by schema changes during incremental updates.
Knowing the limits of incremental runs helps prevent subtle bugs in production.
6
ExpertAdvanced incremental patterns and pitfalls
🤔Before reading on: do you think stacking multiple incremental models always improves performance? Commit to your answer.
Concept: Explore complex incremental strategies, including handling deletes, updates, and dependencies.
Incremental models can be chained, but you must carefully manage keys and filters to avoid duplicates or missing data. Also, dbt's is_incremental() does not detect deletes; you need custom logic or snapshots for that. Understanding these nuances is critical for robust pipelines.
Result
You can build reliable, efficient incremental pipelines with complex logic.
Mastering incremental nuances prevents data quality issues and maximizes pipeline performance.
Under the Hood
When dbt runs a model, it sets a flag indicating if the run is incremental or full refresh. The is_incremental() macro checks this flag at runtime and returns true or false accordingly. This allows the SQL code to branch dynamically. Internally, dbt manages incremental state by tracking the target table and its data, enabling selective inserts or updates.
Why designed this way?
Incremental processing was designed to optimize large data transformations by avoiding full rebuilds. The macro approach lets users write flexible SQL without changing dbt's core engine. Alternatives like separate models or manual flags would be less elegant and harder to maintain.
┌───────────────┐
│ dbt run start │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Check run type│
│ (full or incr)│
└──────┬────────┘
       │
       ▼
┌─────────────────────────────┐
│ is_incremental() returns bool│
└──────┬──────────────────────┘
       │
       ▼
┌─────────────────────────────┐
│ SQL branches based on result │
└─────────────────────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does is_incremental() return true during full refresh runs? Commit yes or no.
Common Belief:is_incremental() returns true anytime the model runs.
Tap to reveal reality
Reality:is_incremental() returns true only during incremental runs, not full refreshes.
Why it matters:Assuming it returns true always can cause your model to skip rebuilding data, leading to stale or incomplete tables.
Quick: Can is_incremental() detect deleted rows automatically? Commit yes or no.
Common Belief:is_incremental() handles inserts, updates, and deletes automatically.
Tap to reveal reality
Reality:is_incremental() only helps with inserts and updates; deletes require extra logic or snapshots.
Why it matters:Ignoring deletes can cause your data to accumulate outdated rows, harming accuracy.
Quick: Does using is_incremental() guarantee faster runs regardless of data size? Commit yes or no.
Common Belief:Using is_incremental() always makes model runs faster.
Tap to reveal reality
Reality:Incremental runs are faster only if the filtering logic is correct and the data volume of new rows is small.
Why it matters:Incorrect filters or large incremental loads can make incremental runs as slow or slower than full refreshes.
Quick: Does is_incremental() automatically handle schema changes? Commit yes or no.
Common Belief:is_incremental() adapts automatically to schema changes in source data.
Tap to reveal reality
Reality:Schema changes often require a full refresh or manual migration steps; is_incremental() does not handle this.
Why it matters:Failing to manage schema changes can cause errors or corrupted data during incremental runs.
Expert Zone
1
Incremental models require a unique key or timestamp to filter new data correctly; missing this leads to duplicates or missed rows.
2
is_incremental() does not track deletes; combining incremental models with dbt snapshots or external CDC tools is necessary for full change capture.
3
Using is_incremental() inside macros or nested logic can cause unexpected behavior if the macro context is not properly managed.
When NOT to use
Avoid using is_incremental() when your data source lacks reliable unique keys or timestamps, or when your data changes require full rebuilds due to schema evolution. In such cases, prefer full refresh models or dbt snapshots for change tracking.
Production Patterns
In production, teams use is_incremental() to build large fact tables that update daily with new events. They combine it with partitioning and careful filtering to minimize processing time. For slowly changing dimensions, they use snapshots alongside incremental models to capture deletes and history.
Connections
Change Data Capture (CDC)
Builds-on
Understanding is_incremental() helps grasp how incremental data pipelines relate to CDC, which tracks all data changes including deletes.
Version Control Systems
Similar pattern
Like incremental commits in version control only save changes, is_incremental() enables saving only new data, optimizing storage and processing.
Event-driven Programming
Conceptual analogy
Both is_incremental() and event-driven systems react only to new inputs or changes, avoiding unnecessary full reprocessing.
Common Pitfalls
#1Not filtering new data inside is_incremental() block
Wrong approach:select * from source_table
Correct approach:select * from source_table where updated_at > (select max(updated_at) from {{ this }})
Root cause:Learners forget to limit data during incremental runs, causing full data reloads and losing performance benefits.
#2Assuming is_incremental() handles deletes automatically
Wrong approach:{% if is_incremental() %} insert new rows only {% endif %} -- no delete logic
Correct approach:Use dbt snapshots or custom delete logic alongside incremental inserts
Root cause:Misunderstanding that incremental macro only controls inserts/updates, not deletions.
#3Running incremental model after schema change without full refresh
Wrong approach:Run incremental model immediately after adding a new column
Correct approach:Run dbt full-refresh or drop and recreate table before incremental runs
Root cause:Not realizing incremental runs assume stable schema, leading to errors or corrupted data.
Key Takeaways
The is_incremental() macro lets your dbt models know if they are running incrementally or as a full refresh.
Using is_incremental() allows you to write SQL that processes only new or changed data, saving time and resources.
Incremental runs require careful filtering and stable schema to work correctly; otherwise, full refreshes are safer.
is_incremental() does not handle deletes or schema changes automatically; additional logic or tools are needed.
Mastering is_incremental() is essential for building efficient, scalable data pipelines in dbt.