Configuring sources in YAML in dbt - Performance & Efficiency
When we configure sources in YAML for dbt, we define where our data comes from.
We want to understand how the time to process these configurations grows as we add more sources or tables.
Analyze the time complexity of this YAML source configuration snippet.
sources:
- name: sales_db
tables:
- name: customers
- name: orders
- name: products
This snippet defines one source with three tables listed under it.
Look at what repeats when dbt reads this YAML configuration.
- Primary operation: Reading each table entry under a source.
- How many times: Once for each table listed in the source.
As you add more tables to a source, dbt reads each one in turn.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 tables | 10 reads |
| 100 tables | 100 reads |
| 1000 tables | 1000 reads |
Pattern observation: The work grows directly with the number of tables.
Time Complexity: O(n)
This means the time to process source configurations grows linearly with the number of tables.
[X] Wrong: "Adding more tables won't affect processing time much because YAML is just text."
[OK] Correct: Even though YAML is text, dbt must read and process each table entry, so more tables mean more work.
Understanding how configuration size affects processing helps you explain efficiency in real projects.
What if we added multiple sources each with many tables? How would the time complexity change?