Unique key for merge behavior in dbt - Time & Space Complexity
When using dbt to merge data, it's important to understand how the unique key affects performance.
We want to know how the time to merge grows as the data size increases.
Analyze the time complexity of this dbt merge operation.
merge into target_table as t
using source_table as s
on t.unique_key = s.unique_key
when matched then update set
t.value = s.value
when not matched then insert (unique_key, value) values (s.unique_key, s.value);
This code merges data from source_table into target_table using a unique key to match rows.
Look at what repeats during the merge.
- Primary operation: Matching rows by unique key between source and target tables.
- How many times: Once for each row in the source table.
As the number of rows in the source table grows, the merge checks each row against the target.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 key lookups and updates/inserts |
| 100 | About 100 key lookups and updates/inserts |
| 1000 | About 1000 key lookups and updates/inserts |
Pattern observation: The work grows roughly in direct proportion to the number of rows to merge.
Time Complexity: O(n)
This means the time to merge grows linearly with the number of rows in the source table.
[X] Wrong: "The merge will be constant time because the unique key makes matching instant."
[OK] Correct: Even with a unique key, the database must check each source row once, so time grows with data size.
Understanding how merge operations scale helps you explain data pipeline efficiency clearly and confidently.
What if the unique key was not indexed? How would the time complexity change?