Why models are the core of dbt - Performance Analysis
Start learning this pattern below
Jump into concepts and practice - no test required
We want to understand how the work done by dbt models grows as data size increases.
How does the time to build models change when the input data grows?
Analyze the time complexity of this simple dbt model SQL code.
-- models/my_model.sql
select
user_id,
count(*) as total_orders
from {{ ref('raw_orders') }}
group by user_id
This model groups raw orders by user and counts orders per user.
Look at what repeats as data grows.
- Primary operation: Scanning all rows in the raw_orders table.
- How many times: Once over all input rows to group and count.
As the number of rows in raw_orders grows, the work grows roughly the same amount.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 scans and counts |
| 100 | 100 scans and counts |
| 1000 | 1000 scans and counts |
Pattern observation: The work grows directly with the number of rows.
Time Complexity: O(n)
This means the time to build the model grows linearly with the input data size.
[X] Wrong: "The model runs instantly no matter how big the data is."
[OK] Correct: The model must process every row, so more data means more work and more time.
Knowing how model build time grows helps you explain performance and plan data workflows confidently.
"What if the model joined two large tables instead of one? How would the time complexity change?"
Practice
Solution
Step 1: Understand the purpose of models in dbt
Models are SQL files that define how raw data is transformed into clean, organized tables or views.Step 2: Identify the correct role from options
Only To transform raw data into useful tables or views describes transforming raw data into useful tables or views, which is the core function of models.Final Answer:
To transform raw data into useful tables or views -> Option AQuick Check:
Models transform data [OK]
- Confusing models with dashboards
- Thinking models store raw data unchanged
- Assuming models manage permissions
Solution
Step 1: Recall dbt model file requirements
dbt models are SQL files that contain SELECT statements to transform data.Step 2: Match file type and content
Onlymodels/my_model.sqlcontaining a SELECT statement uses a .sql file with a SELECT statement, which is correct for a dbt model.Final Answer:
models/my_model.sql containing a SELECT statement -> Option DQuick Check:
Model = SQL file with SELECT [OK]
- Using Python or text files for models
- Confusing config files with models
- Not including a SELECT statement in model files
SELECT user_id, COUNT(*) AS orders_count FROM raw.orders GROUP BY user_id
What will this model produce when run?
Solution
Step 1: Analyze the SQL query in the model
The query selects user_id and counts orders grouped by user_id, aggregating orders per user.Step 2: Determine the output of the model
The model will create a table or view showing each user_id with their total number of orders.Final Answer:
A table or view with user_id and their total order counts -> Option AQuick Check:
GROUP BY user_id with COUNT(*) = aggregated counts [OK]
- Ignoring GROUP BY and expecting raw data
- Thinking COUNT(*) causes errors
- Assuming counts are missing
SELECT customer_id, date, SUM(amount) AS total FROM sales GROUP BY customer_id
But dbt throws an error. What is the likely problem?
Solution
Step 1: Check SELECT and GROUP BY columns
SELECT has customer_id, date, and SUM(amount), but GROUP BY includes only customer_id.Step 2: Identify mismatch causing error
All non-aggregated columns in SELECT must be in GROUP BY. date is missing in GROUP BY, causing error.Final Answer:
The SELECT includes date but GROUP BY does not, causing mismatch -> Option BQuick Check:
GROUP BY columns must match SELECT non-aggregates [OK]
- Ignoring GROUP BY and SELECT column mismatch
- Thinking SUM() can't be used with GROUP BY
- Assuming WHERE clause is mandatory
Solution
Step 1: Identify how models transform data in dbt
Models are SQL files that transform raw data into organized tables, like monthly summaries.Step 2: Choose the option that uses dbt models correctly
Write a SQL model that selects sales data, groups by month, and calculates totals uses a SQL model to group and summarize sales by month, fitting dbt's core purpose.Final Answer:
Write a SQL model that selects sales data, groups by month, and calculates totals -> Option CQuick Check:
Models transform data with SQL for summaries [OK]
- Using external tools instead of dbt models
- Confusing YAML config with data transformation
- Ignoring dbt's SQL model workflow
