0
0
dbtdata~20 mins

Snapshot tables for historical tracking in dbt - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Snapshot Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Purpose of Snapshot Tables in dbt

What is the main purpose of using snapshot tables in dbt for historical tracking?

ATo store only the latest state of data without any history.
BTo create temporary tables for intermediate calculations during transformations.
CTo capture and store changes in source data over time for auditing and trend analysis.
DTo optimize query performance by indexing source tables.
Attempts:
2 left
💡 Hint

Think about why you would want to keep past versions of data.

Predict Output
intermediate
2:00remaining
Output of a dbt Snapshot with Unique Key Changes

Given the following snapshot configuration and source data changes, what will be the output in the snapshot table after the second run?

-- Snapshot config
unique_key: id
strategy: timestamp
updated_at: last_updated

-- Source data before first run:
id | value | last_updated
1  | 100   | 2024-01-01 10:00:00
2  | 200   | 2024-01-01 10:00:00

-- Source data before second run:
id | value | last_updated
1  | 110   | 2024-01-02 10:00:00
2  | 200   | 2024-01-01 10:00:00
3  | 300   | 2024-01-02 10:00:00
ASnapshot table will have 4 rows: id=1 with value=100 (current=false), id=1 with value=110 (current=true), id=2 with value=200 (current=true), id=3 with value=300 (current=true).
BSnapshot table will have 3 rows: id=1 with value=110, id=2 with value=200, id=3 with value=300, all with current=true.
CSnapshot table will have 2 rows: id=1 with value=110, id=3 with value=300, both current=true; id=2 is removed.
DSnapshot table will have 3 rows: id=1 with value=100, id=2 with value=200, id=3 with value=300, all current=false.
Attempts:
2 left
💡 Hint

Remember that snapshots keep history and mark old rows as current=false.

🔧 Debug
advanced
2:00remaining
Identifying the Cause of Missing Historical Rows in Snapshot

A dbt snapshot is configured with the 'check_cols' strategy to track changes. However, after multiple runs, the snapshot table only contains the latest version of each row, with no historical versions. What is the most likely cause?

AThe snapshot is configured with 'strategy: timestamp' instead of 'check_cols'.
BThe source table has no updated rows, so no new snapshot rows are created.
CThe 'unique_key' is not correctly set, causing rows to overwrite each other.
DThe 'check_cols' parameter is set to an empty list, so no columns are checked for changes.
Attempts:
2 left
💡 Hint

Think about what happens if no columns are checked for changes.

data_output
advanced
2:00remaining
Number of Rows in Snapshot Table After Multiple Runs

Consider a snapshot configured with 'unique_key' on 'customer_id' and 'strategy: timestamp' using 'updated_at' column. The source table has 3 customers initially. Over 3 snapshot runs, customer 1 updates twice, customer 2 updates once, and customer 3 never updates. How many rows will the snapshot table contain after the third run?

A5 rows: 3 original + 2 updates for customer 1 + 1 update for customer 2 (duplicates counted).
B6 rows: 3 original + 2 updates for customer 1 + 1 update for customer 2.
C7 rows: 3 original + 2 updates for customer 1 + 1 update for customer 2 + 1 original for customer 3.
D6 rows: 1 original + 2 updates for customer 1 + 1 update for customer 2 + 2 for customer 3.
Attempts:
2 left
💡 Hint

Remember that unchanged rows do not create new snapshot rows.

🚀 Application
expert
3:00remaining
Designing a Snapshot for Slowly Changing Dimensions

You need to track changes in a customer dimension table where some columns rarely change (e.g., phone number) and others change frequently (e.g., last login). You want to keep historical versions only when rare-change columns update, ignoring frequent changes. Which snapshot configuration best achieves this?

AUse 'strategy: check' with 'check_cols' set to only the rare-change columns.
BUse 'strategy: timestamp' with 'updated_at' column that updates on any change.
CUse 'strategy: check' with 'check_cols' set to all columns including frequent-change columns.
DUse 'strategy: timestamp' with a custom 'updated_at' column that updates on any change.
Attempts:
2 left
💡 Hint

Think about how to ignore frequent changes in snapshot history.