0
0
dbtdata~20 mins

Handling late-arriving data in dbt - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Late Data Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why is handling late-arriving data important in data pipelines?

Late-arriving data can cause issues in analytics and reporting. Which of the following best explains why handling late-arriving data is important?

AIt ensures data freshness by discarding any data that arrives late to keep reports fast.
BIt allows the data pipeline to update historical records accurately when delayed data arrives.
CIt prevents any data from being stored if it arrives after the scheduled pipeline run time.
DIt automatically deletes old data to make room for late-arriving data.
Attempts:
2 left
💡 Hint

Think about how late data affects historical analysis and accuracy.

Predict Output
intermediate
2:00remaining
Output of dbt incremental model with late-arriving data

Consider a dbt incremental model that uses is_incremental() to update records. What will be the output after running this model twice if late-arriving data for an existing date is included in the second run?

dbt
with source_data as (
  select * from {{ ref('raw_events') }}
),
updates as (
  select * from source_data where event_date >= (select max(event_date) from {{ this }})
)

select * from updates
AThe model will update existing dates with late-arriving data and append new dates.
BThe model will append only new dates, ignoring late-arriving data for existing dates.
CThe model will overwrite the entire table with only the latest run's data.
DThe model will fail because <code>is_incremental()</code> is missing.
Attempts:
2 left
💡 Hint

Think about how incremental models handle data for existing keys.

data_output
advanced
2:00remaining
Resulting row count after handling late-arriving data

You have a table with 1000 rows for dates Jan 1-10. On Jan 11, 50 late-arriving rows for Jan 5 arrive. After running a dbt incremental model that merges late data correctly, how many rows will the table have?

A1000 rows, because late-arriving rows replace existing rows for Jan 5.
B950 rows, because some rows are deleted during late data processing.
C1050 rows, because late-arriving rows are appended without deduplication.
D1100 rows, because late-arriving rows are duplicated.
Attempts:
2 left
💡 Hint

Consider how merging late-arriving data affects existing rows.

🔧 Debug
advanced
2:00remaining
Identify the error in this dbt incremental model handling late data

Review the following dbt model code snippet meant to handle late-arriving data. What error will occur when running it?

dbt
{{ config(materialized='incremental') }}

select * from {{ ref('raw_events') }}
where event_date > (select max(event_date) from {{ this }})
AThe model will cause a runtime error because {{ this }} is undefined.
BThe model will fail with a syntax error due to missing <code>is_incremental()</code> check.
CThe model will run but ignore late-arriving data for existing event_date equal to max(event_date).
DThe model will overwrite the entire table instead of incrementally updating.
Attempts:
2 left
💡 Hint

Think about the filter condition and how it handles data equal to max date.

🚀 Application
expert
3:00remaining
Best approach to handle late-arriving data in dbt incremental models

You want to ensure your dbt incremental model correctly updates records when late-arriving data comes in for any date, including past dates. Which approach below is best?

AAlways run the model as full-refresh to ensure all late-arriving data is included.
BUse <code>is_incremental()</code> and filter source data with <code>event_date &gt;= min(event_date)</code> to reprocess all data.
CUse <code>is_incremental()</code> and filter source data with <code>event_date &gt; max(event_date)</code> to append only new dates.
DUse <code>is_incremental()</code> and filter source data with <code>event_date &gt;= (select min(event_date) from {{ this }})</code> to merge late-arriving data for all dates in the table.
Attempts:
2 left
💡 Hint

Think about how to include late-arriving data for any date already in the table without full refresh.