0
0
dbtdata~30 mins

Handling late-arriving data in dbt - Mini Project: Build & Apply

Choose your learning style9 modes available
Handling Late-Arriving Data in dbt
📖 Scenario: You work for an online store that receives daily sales data. Sometimes, sales records arrive late, after the daily report is generated. You want to handle these late-arriving records properly in your data model.
🎯 Goal: Build a dbt model that correctly handles late-arriving sales data by updating the daily sales summary to include any late records.
📋 What You'll Learn
Create a source table with sales data including late-arriving records
Add a configuration variable to define the cutoff date for late data
Write a dbt model SQL query that filters and aggregates sales data including late arrivals
Output the final daily sales summary including late-arriving data
💡 Why This Matters
🌍 Real World
Many businesses receive data late due to delays in data collection or transmission. Handling late-arriving data ensures reports and analyses are accurate and up to date.
💼 Career
Data analysts and engineers often need to build data models that correctly incorporate late-arriving data to maintain data quality and trust in business intelligence.
Progress0 / 4 steps
1
Create the source sales data table
Create a source table called sales_data with these exact columns and values: order_id (1, 2, 3, 4), order_date ('2024-06-01', '2024-06-01', '2024-06-02', '2024-05-31'), and amount (100, 150, 200, 50). Write the SQL to create this table with these rows.
dbt
Need a hint?

Use CREATE TABLE and insert the exact rows with VALUES.

2
Add a cutoff date configuration variable
Create a variable called cutoff_date and set it to the string '2024-06-01'. This variable will be used to identify late-arriving data.
dbt
Need a hint?

Use dbt Jinja syntax to set the variable: {% set cutoff_date = '2024-06-01' %}.

3
Write the dbt model SQL to handle late-arriving data
Write a SQL query that selects order_date and sums amount as total_amount from sales_data. Include all rows where order_date is less than or equal to the cutoff_date variable. Group the results by order_date.
dbt
Need a hint?

Use where order_date <= '{{ cutoff_date }}' to filter dates and group by order_date to aggregate.

4
Output the final daily sales summary including late-arriving data
Print the results of the query to show the daily total sales amounts including late-arriving data.
dbt
Need a hint?

Run the query to see the daily totals including the late-arriving data from '2024-05-31'.