Challenge - 5 Problems

🎖️

Snapshot Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

1:30remaining

Purpose of Snapshot Tables in dbt

What is the main purpose of using snapshot tables in dbt for historical tracking?

ATo store only the latest state of data without any history.

BTo create temporary tables for intermediate calculations during transformations.

CTo capture and store changes in source data over time for auditing and trend analysis.

DTo optimize query performance by indexing source tables.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of a dbt Snapshot with Unique Key Changes

Given the following snapshot configuration and source data changes, what will be the output in the snapshot table after the second run?

-- Snapshot config
unique_key: id
strategy: timestamp
updated_at: last_updated

-- Source data before first run:
id | value | last_updated
1  | 100   | 2024-01-01 10:00:00
2  | 200   | 2024-01-01 10:00:00

-- Source data before second run:
id | value | last_updated
1  | 110   | 2024-01-02 10:00:00
2  | 200   | 2024-01-01 10:00:00
3  | 300   | 2024-01-02 10:00:00

ASnapshot table will have 4 rows: id=1 with value=100 (current=false), id=1 with value=110 (current=true), id=2 with value=200 (current=true), id=3 with value=300 (current=true).

BSnapshot table will have 3 rows: id=1 with value=110, id=2 with value=200, id=3 with value=300, all with current=true.

CSnapshot table will have 2 rows: id=1 with value=110, id=3 with value=300, both current=true; id=2 is removed.

DSnapshot table will have 3 rows: id=1 with value=100, id=2 with value=200, id=3 with value=300, all current=false.

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identifying the Cause of Missing Historical Rows in Snapshot

A dbt snapshot is configured with the 'check_cols' strategy to track changes. However, after multiple runs, the snapshot table only contains the latest version of each row, with no historical versions. What is the most likely cause?

AThe snapshot is configured with 'strategy: timestamp' instead of 'check_cols'.

BThe source table has no updated rows, so no new snapshot rows are created.

CThe 'unique_key' is not correctly set, causing rows to overwrite each other.

DThe 'check_cols' parameter is set to an empty list, so no columns are checked for changes.

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Number of Rows in Snapshot Table After Multiple Runs

Consider a snapshot configured with 'unique_key' on 'customer_id' and 'strategy: timestamp' using 'updated_at' column. The source table has 3 customers initially. Over 3 snapshot runs, customer 1 updates twice, customer 2 updates once, and customer 3 never updates. How many rows will the snapshot table contain after the third run?

A5 rows: 3 original + 2 updates for customer 1 + 1 update for customer 2 (duplicates counted).

B6 rows: 3 original + 2 updates for customer 1 + 1 update for customer 2.

C7 rows: 3 original + 2 updates for customer 1 + 1 update for customer 2 + 1 original for customer 3.

D6 rows: 1 original + 2 updates for customer 1 + 1 update for customer 2 + 2 for customer 3.

Attempts:

2 left

🚀 Application

expert

3:00remaining

Designing a Snapshot for Slowly Changing Dimensions

You need to track changes in a customer dimension table where some columns rarely change (e.g., phone number) and others change frequently (e.g., last login). You want to keep historical versions only when rare-change columns update, ignoring frequent changes. Which snapshot configuration best achieves this?

AUse 'strategy: check' with 'check_cols' set to only the rare-change columns.

BUse 'strategy: timestamp' with 'updated_at' column that updates on any change.

CUse 'strategy: check' with 'check_cols' set to all columns including frequent-change columns.

DUse 'strategy: timestamp' with a custom 'updated_at' column that updates on any change.

Attempts:

2 left