0
0
DbtHow-ToBeginner ยท 3 min read

How to Use Snapshot Strategy in dbt: Simple Guide

In dbt, use the snapshot strategy to track changes in source tables over time by defining a snapshot file with a unique key and a strategy like timestamp or check. This lets you capture historical versions of rows automatically when data changes.
๐Ÿ“

Syntax

The basic syntax for a dbt snapshot includes defining the snapshot block with a unique_key to identify rows, a strategy to detect changes, and the updated_at or check_cols to track updates.

  • unique_key: Column(s) that uniquely identify a row.
  • strategy: Method to detect changes, either timestamp or check.
  • updated_at: Column with timestamp of last update (used with timestamp strategy).
  • check_cols: Columns to compare for changes (used with check strategy).
sql
snapshot my_snapshot:
  {{
    config(
      target_schema='snapshots',
      unique_key='id',
      strategy='timestamp',
      updated_at='last_updated'
    )
  }}

  select * from source_table
๐Ÿ’ป

Example

This example shows a snapshot that tracks changes in a customers table using the timestamp strategy. It uses customer_id as the unique key and updated_at as the timestamp column to detect changes.

sql
snapshot customers_snapshot:
  {{
    config(
      target_schema='snapshots',
      unique_key='customer_id',
      strategy='timestamp',
      updated_at='updated_at'
    )
  }}

  select * from raw.customers
Output
When run, dbt creates a snapshot table in the 'snapshots' schema that stores historical versions of each customer row whenever the 'updated_at' timestamp changes.
โš ๏ธ

Common Pitfalls

Common mistakes when using dbt snapshots include:

  • Not setting a proper unique_key, causing incorrect row matching.
  • Using the wrong strategy for your data type (e.g., using timestamp without a reliable timestamp column).
  • Forgetting to include all columns that should trigger a change in check_cols when using the check strategy.
  • Not running dbt snapshot command to materialize snapshots.

Example of a wrong snapshot config and the fix:

sql
snapshot wrong_snapshot:
  {{
    config(
      unique_key='id',
      strategy='timestamp'
      # Missing updated_at column
    )
  }}

  select * from source_table

-- Fix:
snapshot correct_snapshot:
  {{
    config(
      unique_key='id',
      strategy='timestamp',
      updated_at='last_modified'
    )
  }}

  select * from source_table
๐Ÿ“Š

Quick Reference

PropertyDescriptionExample
unique_keyColumn(s) uniquely identifying each row`unique_key='id'`
strategyMethod to detect changes: 'timestamp' or 'check'`strategy='timestamp'`
updated_atTimestamp column for 'timestamp' strategy`updated_at='updated_at'`
check_colsColumns to compare for changes in 'check' strategy`check_cols=['col1', 'col2']`
target_schemaSchema where snapshot table is created`target_schema='snapshots'`
โœ…

Key Takeaways

Use a unique key to identify rows in your snapshot.
Choose 'timestamp' strategy with a reliable timestamp column or 'check' strategy with columns to compare.
Always run 'dbt snapshot' to create or update snapshot tables.
Ensure your snapshot config includes all necessary fields to detect changes.
Snapshots help track historical changes in source data automatically.