0
0
DbtConceptBeginner · 4 min read

What is SCD Type 2 in dbt: Explanation and Example

In dbt, SCD Type 2 (Slowly Changing Dimension Type 2) is a method to track historical changes in data by creating new records for each change instead of overwriting. This preserves full history, allowing you to see how data looked at any point in time.
⚙️

How It Works

SCD Type 2 works by keeping old records unchanged and adding new rows when data changes. Imagine a library card catalog where each time a book's location changes, you add a new card instead of erasing the old one. This way, you can always see where the book was at any past date.

In dbt, this means your table will have extra columns like effective_date and end_date or a current_flag to mark which record is active. When a change happens, dbt inserts a new row with the updated data and updates the old row to show it is no longer current.

💻

Example

This example shows how to implement SCD Type 2 in dbt using SQL to track changes in a customer table.
sql
with source_data as (
    select * from {{ ref('raw_customers') }}
),

latest_records as (
    select *,
        row_number() over (partition by customer_id order by updated_at desc) as rn
    from source_data
),

scd_type_2 as (
    select
        customer_id,
        name,
        email,
        effective_date,
        coalesce(end_date, '9999-12-31') as end_date,
        current_flag
    from {{ this }}

    union all

    select
        s.customer_id,
        s.name,
        s.email,
        current_date as effective_date,
        '9999-12-31' as end_date,
        true as current_flag
    from latest_records s
    left join {{ this }} t
      on s.customer_id = t.customer_id and t.current_flag = true
    where (s.name != t.name or s.email != t.email) or t.customer_id is null
)

select * from scd_type_2
Output
customer_id | name | email | effective_date | end_date | current_flag ------------|------------|-------------------|----------------|------------|------------- 1 | Alice | alice@mail.com | 2023-01-01 | 2023-06-01 | false 1 | Alice B. | aliceb@mail.com | 2023-06-02 | 9999-12-31 | true 2 | Bob | bob@mail.com | 2023-01-01 | 9999-12-31 | true
🎯

When to Use

Use SCD Type 2 when you need to keep a full history of changes in your data, such as tracking customer address changes, product price updates, or employee role changes. This is important for accurate reporting over time, auditing, and understanding trends.

For example, if a customer changes their email, SCD Type 2 lets you see both the old and new emails with the dates they were valid. This is useful in marketing analysis, compliance, and customer service.

Key Points

  • SCD Type 2 preserves historical data by adding new rows for changes.
  • It uses columns like effective_date, end_date, and current_flag to track record validity.
  • dbt models can implement SCD Type 2 with SQL logic to merge new and existing data.
  • This method supports accurate time-based analysis and auditing.

Key Takeaways

SCD Type 2 in dbt tracks full history by adding new rows for data changes.
It uses date and flag columns to mark active and historical records.
Use it when you need to analyze how data evolved over time.
dbt SQL models can implement SCD Type 2 with simple merge logic.