dbt-utils (surrogate_key, pivot, unpivot) - Time & Space Complexity
We want to understand how the time needed to run dbt-utils macros changes as the data grows.
Specifically, how do surrogate_key, pivot, and unpivot behave with bigger tables?
Analyze the time complexity of the following dbt-utils macros usage.
-- Create a surrogate key from multiple columns
select
{{ dbt_utils.surrogate_key(['id', 'date']) }} as surrogate_key,
user_id,
event_date,
event_type
from {{ ref('events') }}
-- Pivot event types into columns
select * from {{ dbt_utils.pivot(
relation=ref('events'),
column='event_type',
values=['click', 'view', 'purchase'],
aggregate='count'
) }}
-- Unpivot event counts back to rows
select * from {{ dbt_utils.unpivot(
relation=ref('pivoted_events'),
columns=['click', 'view', 'purchase'],
name='event_type',
value='event_count'
) }}
This code creates a unique key, then pivots and unpivots event data for analysis.
Look at what repeats as data grows.
- Primary operation: Scanning all rows in the input table for each macro.
- How many times: Once per row for surrogate_key; once per row for pivot and unpivot transformations.
As the number of rows (n) grows, the operations scale roughly with n.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 row scans |
| 100 | About 100 row scans |
| 1000 | About 1000 row scans |
Pattern observation: The work grows directly with the number of rows, so doubling rows doubles work.
Time Complexity: O(n)
This means the time to run these macros grows linearly with the number of rows in the data.
[X] Wrong: "Pivoting or unpivoting will multiply the time by the number of columns involved."
[OK] Correct: These macros scan each row once and transform columns, so the main cost depends on rows, not columns count.
Understanding how data transformations scale helps you write efficient data models and explain your choices clearly.
"What if we added nested pivoting inside the unpivot macro? How would that affect the time complexity?"