How to Use Snapshot Strategy in dbt: Simple Guide
In dbt, use the
snapshot strategy to track changes in source tables over time by defining a snapshot file with a unique key and a strategy like timestamp or check. This lets you capture historical versions of rows automatically when data changes.Syntax
The basic syntax for a dbt snapshot includes defining the snapshot block with a unique_key to identify rows, a strategy to detect changes, and the updated_at or check_cols to track updates.
- unique_key: Column(s) that uniquely identify a row.
- strategy: Method to detect changes, either
timestamporcheck. - updated_at: Column with timestamp of last update (used with
timestampstrategy). - check_cols: Columns to compare for changes (used with
checkstrategy).
sql
snapshot my_snapshot:
{{
config(
target_schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at='last_updated'
)
}}
select * from source_tableExample
This example shows a snapshot that tracks changes in a customers table using the timestamp strategy. It uses customer_id as the unique key and updated_at as the timestamp column to detect changes.
sql
snapshot customers_snapshot:
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at'
)
}}
select * from raw.customersOutput
When run, dbt creates a snapshot table in the 'snapshots' schema that stores historical versions of each customer row whenever the 'updated_at' timestamp changes.
Common Pitfalls
Common mistakes when using dbt snapshots include:
- Not setting a proper
unique_key, causing incorrect row matching. - Using the wrong
strategyfor your data type (e.g., usingtimestampwithout a reliable timestamp column). - Forgetting to include all columns that should trigger a change in
check_colswhen using thecheckstrategy. - Not running
dbt snapshotcommand to materialize snapshots.
Example of a wrong snapshot config and the fix:
sql
snapshot wrong_snapshot:
{{
config(
unique_key='id',
strategy='timestamp'
# Missing updated_at column
)
}}
select * from source_table
-- Fix:
snapshot correct_snapshot:
{{
config(
unique_key='id',
strategy='timestamp',
updated_at='last_modified'
)
}}
select * from source_tableQuick Reference
| Property | Description | Example |
|---|---|---|
| unique_key | Column(s) uniquely identifying each row | `unique_key='id'` |
| strategy | Method to detect changes: 'timestamp' or 'check' | `strategy='timestamp'` |
| updated_at | Timestamp column for 'timestamp' strategy | `updated_at='updated_at'` |
| check_cols | Columns to compare for changes in 'check' strategy | `check_cols=['col1', 'col2']` |
| target_schema | Schema where snapshot table is created | `target_schema='snapshots'` |
Key Takeaways
Use a unique key to identify rows in your snapshot.
Choose 'timestamp' strategy with a reliable timestamp column or 'check' strategy with columns to compare.
Always run 'dbt snapshot' to create or update snapshot tables.
Ensure your snapshot config includes all necessary fields to detect changes.
Snapshots help track historical changes in source data automatically.