0
0
dbtdata~10 mins

Slowly changing dimensions (SCD Type 2) in dbt - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Slowly changing dimensions (SCD Type 2)
Start with existing dimension table
New source data arrives
Compare source with dimension on business key
No change
Keep old row
Insert new row with updated data and new start date
Updated dimension table
This flow shows how SCD Type 2 tracks changes by expiring old rows and adding new rows for changes.
Execution Sample
dbt
WITH source_data AS (
  SELECT 1 AS id, 'Alice' AS name, '2024-01-01' AS effective_date
),
existing_dim AS (
  SELECT 1 AS id, 'Alicia' AS name, '2023-01-01' AS start_date, '9999-12-31' AS end_date
)
SELECT * FROM source_data
-- Further logic to compare and update dimension would follow here
This example compares new source data with existing dimension rows to detect changes and update accordingly.
Execution Table
StepActionSource DataExisting Dim RowChange Detected?Old Row End DateNew Row Inserted
1Load source data{id:1, name:'Alice', effective_date:'2024-01-01'}N/AN/AN/AN/A
2Load existing dimensionN/A{id:1, name:'Alicia', start_date:'2023-01-01', end_date:'9999-12-31'}N/AN/AN/A
3Compare source and existingname='Alice'name='Alicia'YesN/AN/A
4Expire old rowN/ASet end_date='2023-12-31'Yes2023-12-31No
5Insert new row{id:1, name:'Alice', start_date:'2024-01-01', end_date:'9999-12-31'}N/AYesN/AYes
6Final dimension stateN/ATwo rows: old expired, new activeN/A2023-12-31Yes
💡 Process stops after inserting new row and updating old row's end date.
Variable Tracker
VariableStartAfter Step 3After Step 4After Step 5Final
old_row_end_date9999-12-319999-12-312023-12-312023-12-312023-12-31
new_row_insertedNoNoNoYesYes
change_detectedNoYesYesYesYes
Key Moments - 3 Insights
Why do we keep the old row instead of just updating it?
Because SCD Type 2 tracks history, the old row is expired by setting an end date (see Step 4), so we keep past data intact.
How do we know when to insert a new row?
When the source data differs from the current active row (Step 3 shows change detected), we insert a new row with updated info (Step 5).
What does the end_date '9999-12-31' mean?
It marks the row as currently active; when a change happens, this end_date is updated to expire the row (Step 4).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, at which step is the old row expired?
AStep 4
BStep 3
CStep 5
DStep 6
💡 Hint
Check the 'Old Row End Date' column in the execution table.
According to the variable tracker, what is the value of 'new_row_inserted' after Step 5?
ANo
BYes
CMaybe
DNot set
💡 Hint
Look at the 'new_row_inserted' row under 'After Step 5' in variable_tracker.
If the source data name matched the existing dimension name, what would happen in the execution table?
AChange detected would be Yes
BOld row would be expired
CNo new row inserted
DProcess would insert duplicate rows
💡 Hint
Refer to Step 3 and Step 5 in the execution table for change detection and row insertion.
Concept Snapshot
SCD Type 2 tracks history by:
- Comparing new data with current active rows
- If change, expire old row by setting end_date
- Insert new row with updated data and new start_date
- Use '9999-12-31' as active row end_date
- Keeps full change history in dimension table
Full Transcript
Slowly Changing Dimensions Type 2 (SCD Type 2) is a method to track changes in dimension data over time. When new source data arrives, it is compared to existing dimension rows using a business key. If no change is detected, the existing row remains unchanged. If a change is detected, the old row is expired by setting its end date to the day before the new data's start date. Then, a new row is inserted with the updated data and a start date matching the change date. The end date '9999-12-31' marks the row as currently active. This process preserves historical data and allows analysis of changes over time. The execution table shows each step: loading data, detecting changes, expiring old rows, and inserting new rows. The variable tracker follows key variables like old row end date and whether a new row was inserted. Key moments clarify why old rows are kept and how changes trigger new rows. The visual quiz tests understanding of these steps. This method is common in data warehousing and dbt models to maintain accurate historical records.