Challenge - 5 Problems

🎖️

Master of Full Refresh and Incremental Models

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Understanding Full Refresh in dbt

Which statement best describes what happens during a full refresh in dbt?

Adbt updates only the changed rows in the existing table using merge operations.

Bdbt deletes the existing table and rebuilds it completely from the source data.

Cdbt only adds new rows to the existing table without modifying existing data.

Ddbt skips the model and uses cached results from the previous run.

Attempts:

2 left

🧠 Conceptual

intermediate

2:00remaining

Incremental Model Behavior

What is the main advantage of using an incremental model in dbt?

AIt processes only new or changed data, reducing runtime and resource use.

BIt rebuilds the entire table every time to ensure data accuracy.

CIt automatically archives old data to a separate table.

DIt disables all data validations during the run.

Attempts:

2 left

❓ data_output

advanced

3:00remaining

Output of Incremental Model Run

Given this incremental model SQL snippet, what will be the content of the target table after running the model twice?

-- model.sql
{{ config(materialized='incremental', unique_key='id') }}

select id, value from source_table where updated_at > (select max(updated_at) from {{ this }})

Assume source_table initially has rows with ids 1 and 2, then a new row with id 3 is added before the second run.

dbt

import pandas as pd

# Initial source_table data
source_table_1 = pd.DataFrame({'id': [1, 2], 'value': ['a', 'b'], 'updated_at': ['2024-01-01', '2024-01-02']})

# After first run, target table content
first_run = source_table_1[['id', 'value']]

# New data added
source_table_2 = pd.DataFrame({'id': [1, 2, 3], 'value': ['a', 'b', 'c'], 'updated_at': ['2024-01-01', '2024-01-02', '2024-01-03']})

# After second run, incremental adds only new row
second_run = pd.concat([first_run, source_table_2[source_table_2['id'] == 3][['id', 'value']]], ignore_index=True)

second_run

A[{'id': 1, 'value': 'a'}, {'id': 2, 'value': 'b'}, {'id': 3, 'value': 'c'}]

B[{'id': 3, 'value': 'c'}]

C[{'id': 1, 'value': 'a'}, {'id': 2, 'value': 'b'}]

D[]

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Error in Incremental Model Unique Key

Consider this dbt incremental model configuration:

{{ config(materialized='incremental', unique_key='user_id') }}

The source data has duplicate user_id values in the new data batch. What error or issue will most likely occur when running this model?

Adbt will silently drop duplicate rows without warning.

Bdbt will fail with a syntax error due to duplicate keys.

Cdbt will raise a <strong>unique key violation</strong> error during the merge step.

Ddbt will rebuild the entire table ignoring incremental logic.

Attempts:

2 left

🚀 Application

expert

3:00remaining

Choosing Between Full Refresh and Incremental

You manage a dbt model that processes millions of rows daily. The source data sometimes has late-arriving updates for past dates. Which approach best balances performance and data accuracy?

ADisable incremental and run only snapshots to track changes.

BAlways use full refresh to ensure all data is accurate despite longer runtimes.

CUse incremental models without any reprocessing window and rely on source data correctness.

DUse incremental models with a window of days to reprocess recent data and full refresh weekly.

Attempts:

2 left