0
0
dbtdata~5 mins

Handling late-arriving data in dbt

Choose your learning style9 modes available
Introduction

Sometimes data comes late or out of order. Handling late-arriving data helps keep your reports accurate and up-to-date.

When sales data arrives a day or more after the sale happened.
When customer updates come after your daily report runs.
When sensor data is delayed due to network issues.
When you want to correct past data without breaking your reports.
Syntax
dbt
with source_data as (
    select * from {{ ref('raw_table') }}
),
updated_data as (
    select *,
        case
            when event_date < current_date - interval '1 day' then 'late'
            else 'on_time'
        end as data_status
    from source_data
)
select * from updated_data

Use case statements to flag late data.

Use ref() to refer to other dbt models or tables.

Examples
Flags events that arrived more than 1 hour late.
dbt
select *,
    case
        when event_timestamp < current_timestamp - interval '1 hour' then 'late'
        else 'on_time'
    end as data_status
from {{ ref('events') }}
Selects orders that arrived late by more than 2 days.
dbt
with late_data as (
    select * from {{ ref('orders') }}
    where order_date < current_date - interval '2 days'
)
select * from late_data
Sample Program

This dbt model flags sales as 'late' if they arrived more than one day after the sale date. It helps identify and handle late-arriving sales data.

dbt
with raw_sales as (
    select * from {{ ref('sales_raw') }}
),
flagged_sales as (
    select *,
        case
            when sale_date < current_date - interval '1 day' then 'late'
            else 'on_time'
        end as arrival_status
    from raw_sales
)
select sale_id, sale_date, arrival_status from flagged_sales order by sale_date
OutputSuccess
Important Notes

Late-arriving data can cause reports to be wrong if not handled.

Flagging late data helps you decide how to update or ignore it.

Use dbt's incremental models to update only changed or late data efficiently.

Summary

Late-arriving data means data that comes after expected time.

Flag or filter late data to keep your analysis accurate.

dbt helps manage late data with SQL and incremental models.