Multi-source fan-in patterns in dbt - Time & Space Complexity
When combining data from many sources into one place, it is important to know how the work grows as sources increase.
We ask: How does the time to combine data change when we add more sources?
Analyze the time complexity of the following dbt model combining multiple sources.
with source1 as (
select * from {{ source('db1', 'table1') }}
),
source2 as (
select * from {{ source('db2', 'table2') }}
),
source3 as (
select * from {{ source('db3', 'table3') }}
)
select * from source1
union all
select * from source2
union all
select * from source3
This code reads from three different sources and combines all rows into one result.
Look for repeated work in the code.
- Primary operation: Reading and scanning each source table.
- How many times: Once per source, here 3 times, but could be more if more sources added.
As the number of sources grows, the total work grows by adding the work for each source.
| Number of Sources (n) | Approx. Operations |
|---|---|
| 3 | 3 times reading source tables |
| 10 | 10 times reading source tables |
| 100 | 100 times reading source tables |
Pattern observation: The work grows linearly as you add more sources.
Time Complexity: O(n)
This means the time to combine data grows directly in proportion to the number of sources.
[X] Wrong: "Adding more sources won't increase time much because they run in parallel."
[OK] Correct: Even if sources run in parallel, total work still adds up, and combining results takes time too.
Understanding how combining many data sources affects time helps you design efficient data pipelines and explain your choices clearly.
"What if we replaced union all with join operations? How would the time complexity change?"