Bird
Raised Fist0
dbtdata~10 mins

Why models are the core of dbt - Visual Breakdown

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Concept Flow - Why models are the core of dbt
Write SQL SELECT query
Create a dbt model file (.sql)
dbt compiles model into SQL
dbt runs model to create table/view
Model becomes a reusable dataset
Other models can reference this model
Builds a dependency graph
Enables incremental, testing, documentation
Models in dbt are SQL files that define datasets. They are compiled and run to create tables or views, forming the core reusable data building blocks.
Execution Sample
dbt
select * from raw.sales
where sale_date >= '2024-01-01'
This SQL defines a model that filters sales data from January 1, 2024 onward.
Execution Table
StepActionInput/StateOutput/Result
1Write SQL queryRaw sales tableSQL query with filter on sale_date
2Save as model fileSQL queryFile models/sales_filtered.sql created
3dbt compileModel fileCompiled SQL ready for execution
4dbt runCompiled SQLTable/view sales_filtered created in warehouse
5Reference modelsales_filteredOther models can use sales_filtered as source
6Build dependency graphAll modelsGraph showing model dependencies
7Enable testing and docsModels and graphTests and docs generated for models
💡 All models compiled and run, dependency graph built, enabling testing and documentation
Variable Tracker
VariableStartAfter Step 2After Step 4Final
SQL queryNoneselect * from raw.sales where sale_date >= '2024-01-01'SameSame
Model fileNonemodels/sales_filtered.sql createdSameSame
Compiled SQLNoneNoneCompiled SQL readySame
Table/view in warehouseNoneNonesales_filtered createdSame
Dependency graphEmptyEmptyEmptyGraph with sales_filtered node
Key Moments - 3 Insights
Why do we write SQL files as models instead of running SQL directly?
Because dbt compiles and runs these SQL files to create reusable tables or views, enabling dependency management and testing, as shown in steps 1-4 of the execution table.
How does referencing one model in another help?
Referencing builds a dependency graph (step 6), so dbt knows the order to run models and can manage incremental updates and testing.
What happens after dbt runs a model?
The model creates a table or view in the data warehouse (step 4), which becomes a dataset other models or analysts can use.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is created after step 4?
AA table or view in the warehouse
BA compiled SQL file
CA dependency graph
DA test report
💡 Hint
Check the Output/Result column at step 4 in the execution table
At which step does dbt build the dependency graph?
AStep 3
BStep 6
CStep 5
DStep 7
💡 Hint
Look for 'Build dependency graph' in the Action column of the execution table
If you change the SQL query in the model file, which step must you run again to update the table?
AStep 1
BStep 3
CStep 4
DStep 7
💡 Hint
Step 4 runs the compiled SQL to create or update the table/view
Concept Snapshot
dbt models are SQL files defining datasets.
They compile into SQL run in your warehouse.
Models create tables or views.
Models build a dependency graph.
This enables testing, docs, and incremental builds.
Full Transcript
In dbt, models are the core because they are SQL files that define datasets. You write a SQL query and save it as a model file. dbt compiles this SQL and runs it to create tables or views in your data warehouse. These tables become reusable datasets. Other models can reference these datasets, which builds a dependency graph. This graph helps dbt run models in the right order and supports testing and documentation. The execution steps show writing SQL, saving as a model, compiling, running to create tables, referencing models, building the graph, and enabling tests and docs. Understanding this flow helps beginners see why models are central to dbt's power.

Practice

(1/5)
1. What is the main role of models in dbt?
easy
A. To transform raw data into useful tables or views
B. To store raw data without changes
C. To create visual dashboards
D. To manage user permissions

Solution

  1. Step 1: Understand the purpose of models in dbt

    Models are SQL files that define how raw data is transformed into clean, organized tables or views.
  2. Step 2: Identify the correct role from options

    Only To transform raw data into useful tables or views describes transforming raw data into useful tables or views, which is the core function of models.
  3. Final Answer:

    To transform raw data into useful tables or views -> Option A
  4. Quick Check:

    Models transform data [OK]
Hint: Models transform raw data into tables/views [OK]
Common Mistakes:
  • Confusing models with dashboards
  • Thinking models store raw data unchanged
  • Assuming models manage permissions
2. Which of the following is the correct way to define a model in dbt?
easy
A. models/my_model.yaml containing configuration only
B. models/my_model.py containing Python code
C. models/my_model.txt containing raw data
D. models/my_model.sql containing a SELECT statement

Solution

  1. Step 1: Recall dbt model file requirements

    dbt models are SQL files that contain SELECT statements to transform data.
  2. Step 2: Match file type and content

    Only models/my_model.sql containing a SELECT statement uses a .sql file with a SELECT statement, which is correct for a dbt model.
  3. Final Answer:

    models/my_model.sql containing a SELECT statement -> Option D
  4. Quick Check:

    Model = SQL file with SELECT [OK]
Hint: Models are SQL files with SELECT statements [OK]
Common Mistakes:
  • Using Python or text files for models
  • Confusing config files with models
  • Not including a SELECT statement in model files
3. Given this dbt model SQL code:
SELECT user_id, COUNT(*) AS orders_count FROM raw.orders GROUP BY user_id

What will this model produce when run?
medium
A. A table or view with user_id and their total order counts
B. A list of all orders without grouping
C. An error because COUNT(*) is invalid
D. A table with only user_id and no counts

Solution

  1. Step 1: Analyze the SQL query in the model

    The query selects user_id and counts orders grouped by user_id, aggregating orders per user.
  2. Step 2: Determine the output of the model

    The model will create a table or view showing each user_id with their total number of orders.
  3. Final Answer:

    A table or view with user_id and their total order counts -> Option A
  4. Quick Check:

    GROUP BY user_id with COUNT(*) = aggregated counts [OK]
Hint: GROUP BY with COUNT(*) gives totals per group [OK]
Common Mistakes:
  • Ignoring GROUP BY and expecting raw data
  • Thinking COUNT(*) causes errors
  • Assuming counts are missing
4. You wrote this dbt model SQL:
SELECT customer_id, date, SUM(amount) AS total FROM sales GROUP BY customer_id

But dbt throws an error. What is the likely problem?
medium
A. SUM(amount) cannot be used with GROUP BY
B. The SELECT includes date but GROUP BY does not, causing mismatch
C. customer_id should be aggregated with SUM()
D. Missing WHERE clause causes error

Solution

  1. Step 1: Check SELECT and GROUP BY columns

    SELECT has customer_id, date, and SUM(amount), but GROUP BY includes only customer_id.
  2. Step 2: Identify mismatch causing error

    All non-aggregated columns in SELECT must be in GROUP BY. date is missing in GROUP BY, causing error.
  3. Final Answer:

    The SELECT includes date but GROUP BY does not, causing mismatch -> Option B
  4. Quick Check:

    GROUP BY columns must match SELECT non-aggregates [OK]
Hint: SELECT non-aggregates must match GROUP BY columns [OK]
Common Mistakes:
  • Ignoring GROUP BY and SELECT column mismatch
  • Thinking SUM() can't be used with GROUP BY
  • Assuming WHERE clause is mandatory
5. You want to create a dbt model that builds a monthly sales summary table. Which approach best uses models as the core of dbt?
hard
A. Create a YAML file listing monthly sales without SQL
B. Manually export raw sales data and summarize in Excel
C. Write a SQL model that selects sales data, groups by month, and calculates totals
D. Use a Python script outside dbt to summarize sales

Solution

  1. Step 1: Identify how models transform data in dbt

    Models are SQL files that transform raw data into organized tables, like monthly summaries.
  2. Step 2: Choose the option that uses dbt models correctly

    Write a SQL model that selects sales data, groups by month, and calculates totals uses a SQL model to group and summarize sales by month, fitting dbt's core purpose.
  3. Final Answer:

    Write a SQL model that selects sales data, groups by month, and calculates totals -> Option C
  4. Quick Check:

    Models transform data with SQL for summaries [OK]
Hint: Use SQL models to transform and summarize data [OK]
Common Mistakes:
  • Using external tools instead of dbt models
  • Confusing YAML config with data transformation
  • Ignoring dbt's SQL model workflow