Creating your own dbt package - Performance & Efficiency
When creating your own dbt package, it's important to understand how the time to run your models grows as you add more models or data.
We want to know how the execution time changes when the package size or data size increases.
Analyze the time complexity of the following dbt package structure.
-- models/my_package/model_a.sql
select * from source_table;
-- models/my_package/model_b.sql
select * from {{ ref('model_a') }} where condition = 'value';
-- models/my_package/model_c.sql
select * from {{ ref('model_b') }} join another_table on key = key;
This package has three models where each model depends on the previous one, building on the data step by step.
Look at what repeats when dbt runs this package.
- Primary operation: Running each model's SQL query once in order.
- How many times: Once per model, so three times here.
As you add more models to your package, dbt runs each one in sequence.
| Input Size (models) | Approx. Operations (queries run) |
|---|---|
| 3 | 3 |
| 10 | 10 |
| 100 | 100 |
Pattern observation: The total work grows directly with the number of models you have.
Time Complexity: O(n)
This means the time to run your package grows linearly with the number of models you include.
[X] Wrong: "Adding more models won't affect run time much because dbt runs them fast."
[OK] Correct: Each model runs a query, so more models mean more queries and longer total run time.
Understanding how your dbt package scales helps you design efficient data workflows and shows you can think about performance as your projects grow.
"What if your models had multiple dependencies and some ran in parallel? How would that affect the time complexity?"