Materializations (view, table, incremental, ephemeral) in dbt - Time & Space Complexity
When using dbt materializations, it's important to understand how the time to build models grows as data size increases.
We want to know how the choice of materialization affects the work done as data grows.
Analyze the time complexity of these dbt materializations.
-- View materialization
{{ config(materialized='view') }}
select * from source_table
-- Table materialization
{{ config(materialized='table') }}
select * from source_table
-- Incremental materialization
{{ config(materialized='incremental') }}
select * from source_table where updated_at > (select max(updated_at) from {{ this }})
-- Ephemeral materialization
{{ config(materialized='ephemeral') }}
select * from source_table
These snippets show different ways dbt builds models from source data.
Look at how often data is processed or scanned.
- Primary operation: Scanning rows from source_table.
- How many times:
- View and Table: full scan every run.
- Incremental: scans only new or changed rows.
- Ephemeral: runs as a subquery, no storage, processed each time used.
As source_table grows, the work changes by materialization type.
| Input Size (n rows) | View/Table Operations | Incremental Operations | Ephemeral Operations |
|---|---|---|---|
| 10,000 | Scan 10,000 rows | Scan new rows only (e.g., 100) | Scan 10,000 rows each use |
| 100,000 | Scan 100,000 rows | Scan new rows only (e.g., 1,000) | Scan 100,000 rows each use |
| 1,000,000 | Scan 1,000,000 rows | Scan new rows only (e.g., 10,000) | Scan 1,000,000 rows each use |
Pattern observation: View and Table scan all data every run, so work grows linearly with data size. Incremental scans only new data, so work grows with new rows, not total size. Ephemeral runs full scan each time it is referenced.
Time Complexity: O(n)
This means the time to build or run the model grows roughly in direct proportion to the number of rows processed.
[X] Wrong: "Incremental materialization always processes all data like a table."
[OK] Correct: Incremental only processes new or changed rows, so it usually does less work than full table rebuilds.
Understanding how different materializations affect processing time helps you design efficient data pipelines and explain trade-offs clearly.
"What if we changed an incremental model to a full table rebuild every time? How would the time complexity change?"