What is the main purpose of using snapshot tables in dbt for historical tracking?
Think about why you would want to keep past versions of data.
Snapshot tables in dbt are designed to capture changes in source data over time, allowing you to track historical states for auditing, trend analysis, and debugging.
Given the following snapshot configuration and source data changes, what will be the output in the snapshot table after the second run?
-- Snapshot config unique_key: id strategy: timestamp updated_at: last_updated -- Source data before first run: id | value | last_updated 1 | 100 | 2024-01-01 10:00:00 2 | 200 | 2024-01-01 10:00:00 -- Source data before second run: id | value | last_updated 1 | 110 | 2024-01-02 10:00:00 2 | 200 | 2024-01-01 10:00:00 3 | 300 | 2024-01-02 10:00:00
Remember that snapshots keep history and mark old rows as current=false.
When a row changes (id=1), the old row is marked current=false and a new row with updated value and current=true is added. New rows (id=3) are added as current=true. Unchanged rows (id=2) remain current=true.
A dbt snapshot is configured with the 'check_cols' strategy to track changes. However, after multiple runs, the snapshot table only contains the latest version of each row, with no historical versions. What is the most likely cause?
Think about what happens if no columns are checked for changes.
If 'check_cols' is empty, dbt does not detect any changes in rows, so it never creates new historical rows, resulting in only the latest snapshot per unique key.
Consider a snapshot configured with 'unique_key' on 'customer_id' and 'strategy: timestamp' using 'updated_at' column. The source table has 3 customers initially. Over 3 snapshot runs, customer 1 updates twice, customer 2 updates once, and customer 3 never updates. How many rows will the snapshot table contain after the third run?
Remember that unchanged rows do not create new snapshot rows.
Initial 3 rows exist. Customer 1 updates twice, so 2 new rows added. Customer 2 updates once, so 1 new row added. Customer 3 never updates, so no new rows. Total = 3 + 2 + 1 = 6 rows.
You need to track changes in a customer dimension table where some columns rarely change (e.g., phone number) and others change frequently (e.g., last login). You want to keep historical versions only when rare-change columns update, ignoring frequent changes. Which snapshot configuration best achieves this?
Think about how to ignore frequent changes in snapshot history.
Using 'strategy: check' with 'check_cols' limited to rare-change columns ensures snapshots only create new versions when those columns change, ignoring frequent updates like last login.