What is SCD Type 2 in dbt: Explanation and Example
dbt, SCD Type 2 (Slowly Changing Dimension Type 2) is a method to track historical changes in data by creating new records for each change instead of overwriting. This preserves full history, allowing you to see how data looked at any point in time.How It Works
SCD Type 2 works by keeping old records unchanged and adding new rows when data changes. Imagine a library card catalog where each time a book's location changes, you add a new card instead of erasing the old one. This way, you can always see where the book was at any past date.
In dbt, this means your table will have extra columns like effective_date and end_date or a current_flag to mark which record is active. When a change happens, dbt inserts a new row with the updated data and updates the old row to show it is no longer current.
Example
with source_data as ( select * from {{ ref('raw_customers') }} ), latest_records as ( select *, row_number() over (partition by customer_id order by updated_at desc) as rn from source_data ), scd_type_2 as ( select customer_id, name, email, effective_date, coalesce(end_date, '9999-12-31') as end_date, current_flag from {{ this }} union all select s.customer_id, s.name, s.email, current_date as effective_date, '9999-12-31' as end_date, true as current_flag from latest_records s left join {{ this }} t on s.customer_id = t.customer_id and t.current_flag = true where (s.name != t.name or s.email != t.email) or t.customer_id is null ) select * from scd_type_2
When to Use
Use SCD Type 2 when you need to keep a full history of changes in your data, such as tracking customer address changes, product price updates, or employee role changes. This is important for accurate reporting over time, auditing, and understanding trends.
For example, if a customer changes their email, SCD Type 2 lets you see both the old and new emails with the dates they were valid. This is useful in marketing analysis, compliance, and customer service.
Key Points
- SCD Type 2 preserves historical data by adding new rows for changes.
- It uses columns like
effective_date,end_date, andcurrent_flagto track record validity. - dbt models can implement SCD Type 2 with SQL logic to merge new and existing data.
- This method supports accurate time-based analysis and auditing.