Bird
Raised Fist0
dbtdata~15 mins

Why models are the core of dbt - Why It Works This Way

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Overview - Why models are the core of dbt
What is it?
In dbt, models are SQL files that define how raw data is transformed into clean, organized tables or views. They are the main building blocks where you write the logic to shape your data. Models connect your source data to the final datasets used for analysis. Essentially, models are the heart of dbt projects because they control what data looks like and how it flows.
Why it matters
Without models, dbt would have no way to transform raw data into useful insights. Models solve the problem of messy, unorganized data by providing a clear, repeatable way to clean and structure it. Without this, analysts and data teams would spend too much time fixing data instead of using it. Models make data trustworthy and ready for decision-making.
Where it fits
Before learning about models, you should understand basic SQL and the concept of data transformation. After mastering models, you can explore advanced dbt features like tests, snapshots, and macros that build on models to improve data quality and automation.
Mental Model
Core Idea
Models are like recipes that transform raw ingredients (data) into a finished dish (clean tables) that everyone can enjoy and trust.
Think of it like...
Imagine you have a basket of raw vegetables (raw data). A model is like a cooking recipe that tells you how to wash, chop, and cook these vegetables to make a tasty meal (organized data) ready to serve.
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ Raw Source  │ --> │   Model     │ --> │ Clean Table │
│   Data      │     │ (SQL Logic) │     │ (Output)    │
└─────────────┘     └─────────────┘     └─────────────┘
Build-Up - 7 Steps
1
FoundationUnderstanding raw data sources
🤔
Concept: Raw data is the starting point before any transformation.
Raw data comes from databases, files, or APIs and is often messy or unorganized. It might have duplicates, missing values, or inconsistent formats. Before using this data for analysis, it needs cleaning and structuring.
Result
You recognize that raw data is not ready for direct use and needs transformation.
Understanding raw data's imperfections explains why transformation is necessary.
2
FoundationWhat is a dbt model?
🤔
Concept: A dbt model is a SQL file that defines a transformation step.
In dbt, you write SQL SELECT statements inside model files. Each model creates a table or view in your data warehouse. Models are organized in folders and run in order based on dependencies.
Result
You can create a simple model that selects and filters data from a raw table.
Knowing that models are just SQL files helps demystify dbt and shows how it builds on familiar skills.
3
IntermediateHow models connect and build pipelines
🤔Before reading on: do you think models run independently or depend on each other? Commit to your answer.
Concept: Models can depend on other models, creating a chain of transformations.
You can reference one model inside another using the {{ ref() }} function. This creates a dependency graph where dbt runs models in the right order. This lets you build complex pipelines step-by-step.
Result
You see how multiple models combine to transform raw data into final tables.
Understanding model dependencies reveals how dbt manages complex workflows automatically.
4
IntermediateMaterializations: tables vs views in models
🤔Before reading on: do you think models always create tables or can they create views? Commit to your answer.
Concept: Models can create either tables or views depending on materialization settings.
By default, models create views, which are virtual tables that run the SQL each time queried. You can change materialization to 'table' to store results physically. This affects performance and storage.
Result
You understand how to control model output type for efficiency and cost.
Knowing materializations helps optimize data workflows and resource use.
5
IntermediateUsing Jinja templating inside models
🤔Before reading on: do you think models can include dynamic logic or are they static SQL? Commit to your answer.
Concept: dbt models support Jinja templating to add dynamic SQL generation.
You can use Jinja syntax like loops, conditions, and variables inside models. This lets you write reusable and flexible SQL code that adapts to different environments or inputs.
Result
You can create models that adjust logic without rewriting SQL manually.
Understanding templating unlocks powerful automation and reduces errors.
6
AdvancedIncremental models for large datasets
🤔Before reading on: do you think models always rebuild entire tables or can they update only new data? Commit to your answer.
Concept: Incremental models update only new or changed data instead of full rebuilds.
For very large tables, rebuilding everything is slow and costly. Incremental models use logic to add or update only recent rows. You define unique keys and filters to control this behavior.
Result
You can build efficient models that scale with data size.
Knowing incremental models is key to handling big data in production.
7
ExpertModel dependency graph and execution engine
🤔Before reading on: do you think dbt runs models randomly or follows a plan? Commit to your answer.
Concept: dbt builds a dependency graph to run models in the correct order automatically.
dbt parses all models and their references to create a directed acyclic graph (DAG). It then executes models respecting dependencies, parallelizing where possible. This ensures data consistency and efficient runs.
Result
You understand how dbt orchestrates complex transformations reliably.
Understanding the DAG and execution engine explains why dbt is powerful and reliable for data pipelines.
Under the Hood
dbt reads all model SQL files and parses the {{ ref() }} calls to build a dependency graph. This graph is a map of which models depend on others. When you run dbt, it uses this graph to decide the order of execution. Models are compiled into raw SQL with Jinja templating resolved, then sent to the data warehouse to create tables or views. The warehouse executes the SQL and stores results. dbt tracks metadata to know which models are fresh or need rebuilding.
Why designed this way?
dbt was designed to bring software engineering best practices to data transformation. Using models as modular SQL files with explicit dependencies makes pipelines transparent and maintainable. The DAG approach prevents errors from running models in the wrong order. Jinja templating adds flexibility without losing simplicity. This design balances power, clarity, and ease of use, unlike older monolithic ETL tools.
┌─────────────┐      ┌───────────────┐      ┌───────────────┐
│ Model Files │ ---> │ Dependency    │ ---> │ SQL Compiled  │
│ (SQL + Jinja)│      │ Graph (DAG)   │      │ & Executed in │
└─────────────┘      └───────────────┘      │ Data Warehouse│
                                              └───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do models in dbt only create tables? Commit to yes or no.
Common Belief:Models always create physical tables in the database.
Tap to reveal reality
Reality:Models can create either tables or views depending on materialization settings.
Why it matters:Assuming models always create tables can lead to inefficient storage use and slower development cycles.
Quick: Do you think models run independently without order? Commit to yes or no.
Common Belief:Models run independently and can be executed in any order.
Tap to reveal reality
Reality:Models have dependencies and dbt runs them in a specific order based on the dependency graph.
Why it matters:Ignoring dependencies can cause errors or inconsistent data if models run before their inputs are ready.
Quick: Can you write any SQL in a model without restrictions? Commit to yes or no.
Common Belief:Models are just static SQL files without dynamic capabilities.
Tap to reveal reality
Reality:Models support Jinja templating to add dynamic logic and reuse.
Why it matters:Not using templating limits flexibility and leads to repetitive, error-prone code.
Quick: Do incremental models rebuild entire tables every run? Commit to yes or no.
Common Belief:All models rebuild the full dataset every time they run.
Tap to reveal reality
Reality:Incremental models update only new or changed data to save time and resources.
Why it matters:Not using incremental models on large data can cause slow runs and high costs.
Expert Zone
1
Model performance depends heavily on how SQL is written and how the warehouse optimizes queries, not just on dbt settings.
2
The dependency graph can become complex in large projects, requiring careful management to avoid circular dependencies.
3
Materializations can be customized beyond tables and views, including ephemeral models that never create physical objects but inline SQL.
When NOT to use
Models are not suitable for real-time or streaming data transformations; specialized tools like Apache Kafka or Spark Structured Streaming are better. Also, for very simple data tasks, direct SQL queries without dbt might be faster to prototype.
Production Patterns
In production, teams use models with version control, automated testing, and CI/CD pipelines. Incremental models handle large datasets efficiently. Teams also modularize models into layers (staging, intermediate, marts) for clarity and reuse.
Connections
Software Engineering Modularization
Models in dbt are like modules or functions in programming that encapsulate logic and can be composed.
Understanding modularization in software helps grasp why breaking transformations into models improves maintainability and collaboration.
Directed Acyclic Graphs (DAGs) in Project Management
The model dependency graph in dbt is a DAG, similar to task dependencies in project planning.
Knowing DAGs from project management clarifies how dbt schedules model runs to respect dependencies and avoid conflicts.
Cooking Recipes
Models are like recipes that transform raw ingredients into finished dishes.
This analogy helps beginners relate data transformation to everyday experiences of following step-by-step instructions.
Common Pitfalls
#1Running models without understanding dependencies causes errors.
Wrong approach:dbt run --models model_b model_a
Correct approach:dbt run --models model_a model_b
Root cause:Not recognizing that model_b depends on model_a leads to running them in the wrong order.
#2Using full table rebuilds for very large datasets wastes time and resources.
Wrong approach:materialized='table' without incremental logic on huge tables
Correct approach:materialized='incremental' with unique keys and filters
Root cause:Ignoring incremental models causes unnecessary full data processing.
#3Writing repetitive SQL without templating increases errors and maintenance.
Wrong approach:Copy-pasting similar SQL code across multiple models
Correct approach:Using Jinja macros and variables to reuse code
Root cause:Not leveraging dbt's templating features leads to duplicated effort.
Key Takeaways
Models are the core of dbt because they define how raw data is transformed into clean, usable tables or views.
They are simple SQL files enhanced with templating and dependency management to build reliable data pipelines.
Understanding model dependencies and materializations is key to efficient and correct data workflows.
Advanced features like incremental models and Jinja templating make dbt powerful for large-scale production use.
Mastering models unlocks the full potential of dbt for trustworthy, maintainable, and scalable data transformation.

Practice

(1/5)
1. What is the main role of models in dbt?
easy
A. To transform raw data into useful tables or views
B. To store raw data without changes
C. To create visual dashboards
D. To manage user permissions

Solution

  1. Step 1: Understand the purpose of models in dbt

    Models are SQL files that define how raw data is transformed into clean, organized tables or views.
  2. Step 2: Identify the correct role from options

    Only To transform raw data into useful tables or views describes transforming raw data into useful tables or views, which is the core function of models.
  3. Final Answer:

    To transform raw data into useful tables or views -> Option A
  4. Quick Check:

    Models transform data [OK]
Hint: Models transform raw data into tables/views [OK]
Common Mistakes:
  • Confusing models with dashboards
  • Thinking models store raw data unchanged
  • Assuming models manage permissions
2. Which of the following is the correct way to define a model in dbt?
easy
A. models/my_model.yaml containing configuration only
B. models/my_model.py containing Python code
C. models/my_model.txt containing raw data
D. models/my_model.sql containing a SELECT statement

Solution

  1. Step 1: Recall dbt model file requirements

    dbt models are SQL files that contain SELECT statements to transform data.
  2. Step 2: Match file type and content

    Only models/my_model.sql containing a SELECT statement uses a .sql file with a SELECT statement, which is correct for a dbt model.
  3. Final Answer:

    models/my_model.sql containing a SELECT statement -> Option D
  4. Quick Check:

    Model = SQL file with SELECT [OK]
Hint: Models are SQL files with SELECT statements [OK]
Common Mistakes:
  • Using Python or text files for models
  • Confusing config files with models
  • Not including a SELECT statement in model files
3. Given this dbt model SQL code:
SELECT user_id, COUNT(*) AS orders_count FROM raw.orders GROUP BY user_id

What will this model produce when run?
medium
A. A table or view with user_id and their total order counts
B. A list of all orders without grouping
C. An error because COUNT(*) is invalid
D. A table with only user_id and no counts

Solution

  1. Step 1: Analyze the SQL query in the model

    The query selects user_id and counts orders grouped by user_id, aggregating orders per user.
  2. Step 2: Determine the output of the model

    The model will create a table or view showing each user_id with their total number of orders.
  3. Final Answer:

    A table or view with user_id and their total order counts -> Option A
  4. Quick Check:

    GROUP BY user_id with COUNT(*) = aggregated counts [OK]
Hint: GROUP BY with COUNT(*) gives totals per group [OK]
Common Mistakes:
  • Ignoring GROUP BY and expecting raw data
  • Thinking COUNT(*) causes errors
  • Assuming counts are missing
4. You wrote this dbt model SQL:
SELECT customer_id, date, SUM(amount) AS total FROM sales GROUP BY customer_id

But dbt throws an error. What is the likely problem?
medium
A. SUM(amount) cannot be used with GROUP BY
B. The SELECT includes date but GROUP BY does not, causing mismatch
C. customer_id should be aggregated with SUM()
D. Missing WHERE clause causes error

Solution

  1. Step 1: Check SELECT and GROUP BY columns

    SELECT has customer_id, date, and SUM(amount), but GROUP BY includes only customer_id.
  2. Step 2: Identify mismatch causing error

    All non-aggregated columns in SELECT must be in GROUP BY. date is missing in GROUP BY, causing error.
  3. Final Answer:

    The SELECT includes date but GROUP BY does not, causing mismatch -> Option B
  4. Quick Check:

    GROUP BY columns must match SELECT non-aggregates [OK]
Hint: SELECT non-aggregates must match GROUP BY columns [OK]
Common Mistakes:
  • Ignoring GROUP BY and SELECT column mismatch
  • Thinking SUM() can't be used with GROUP BY
  • Assuming WHERE clause is mandatory
5. You want to create a dbt model that builds a monthly sales summary table. Which approach best uses models as the core of dbt?
hard
A. Create a YAML file listing monthly sales without SQL
B. Manually export raw sales data and summarize in Excel
C. Write a SQL model that selects sales data, groups by month, and calculates totals
D. Use a Python script outside dbt to summarize sales

Solution

  1. Step 1: Identify how models transform data in dbt

    Models are SQL files that transform raw data into organized tables, like monthly summaries.
  2. Step 2: Choose the option that uses dbt models correctly

    Write a SQL model that selects sales data, groups by month, and calculates totals uses a SQL model to group and summarize sales by month, fitting dbt's core purpose.
  3. Final Answer:

    Write a SQL model that selects sales data, groups by month, and calculates totals -> Option C
  4. Quick Check:

    Models transform data with SQL for summaries [OK]
Hint: Use SQL models to transform and summarize data [OK]
Common Mistakes:
  • Using external tools instead of dbt models
  • Confusing YAML config with data transformation
  • Ignoring dbt's SQL model workflow