Staging, intermediate, and marts pattern in dbt - Time & Space Complexity
When using the staging, intermediate, and marts pattern in dbt, we want to understand how the time to build models grows as data size increases.
We ask: How does the processing time change when data gets bigger in each layer?
Analyze the time complexity of this dbt model sequence.
-- staging/customers.sql
select * from raw.customers
-- intermediate/orders_clean.sql
select * from raw.orders where status = 'complete'
-- marts/sales_summary.sql
select customer_id, count(*) as total_orders
from intermediate.orders_clean
group by customer_id
This code shows three layers: staging copies raw data, intermediate filters orders, and marts aggregate sales per customer.
Look at the main repeated work in each layer.
- Primary operation: Scanning rows in tables (select * or filtering)
- How many times: Once per layer, each processes all input rows
As input rows increase, each layer processes more data.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 rows scanned per layer |
| 100 | About 100 rows scanned per layer |
| 1000 | About 1000 rows scanned per layer |
Pattern observation: The work grows roughly in direct proportion to the number of rows.
Time Complexity: O(n)
This means the time to build these models grows linearly as the data size grows.
[X] Wrong: "Adding more layers multiplies the time complexity exponentially."
[OK] Correct: Each layer processes data once, so total time adds up linearly, not multiplies exponentially.
Understanding how data flows through layers and affects processing time helps you design efficient data pipelines and explain your approach clearly.
"What if the marts layer added a nested loop join on a large table? How would the time complexity change?"