What is Snapshot in dbt: Definition and Usage Explained
dbt, a snapshot is a way to capture and store changes in your source data over time. It helps you track how records evolve by saving historical versions, enabling analysis of data changes and trends.How It Works
A snapshot in dbt works like taking a photo of your data at regular intervals. Imagine you have a list of customers and their details that can change over time, like their address or status. Instead of just keeping the latest version, a snapshot saves each change as a new record with a timestamp.
This means you can see not only the current state but also how the data looked in the past. dbt manages this by comparing the current data with the last snapshot and storing only the rows that have changed, making it efficient and easy to track history.
Example
This example shows a simple dbt snapshot that tracks changes in a customers table by comparing the email and status fields.
snapshot customers_snapshot {
target_schema = 'snapshots'
unique_key = 'customer_id'
strategy = 'check'
check_cols = ['email', 'status']
sql = """
select customer_id, email, status, updated_at
from raw.customers
"""
}When to Use
Use snapshots when you need to track how data changes over time but your source system does not keep history. For example, tracking customer status changes, product price updates, or employee role changes.
This is useful for audits, trend analysis, or building slowly changing dimensions in data warehouses where you want to keep a full history of changes.
Key Points
- Snapshots capture historical changes by storing versions of records.
- dbt compares current data with previous snapshots to detect changes.
- They are useful for tracking slowly changing data without native history in source systems.
- Snapshots create tables that include start and end timestamps for each record version.