dbtdata~15 mins

Why models are the core of dbt - Why It Works This Way

Choose your learning style10 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Overview - Why models are the core of dbt

What is it?

In dbt, models are SQL files that define how raw data is transformed into clean, organized tables or views. They are the main building blocks where you write the logic to shape your data. Models connect your source data to the final datasets used for analysis. Essentially, models are the heart of dbt projects because they control what data looks like and how it flows.

Why it matters

Without models, dbt would have no way to transform raw data into useful insights. Models solve the problem of messy, unorganized data by providing a clear, repeatable way to clean and structure it. Without this, analysts and data teams would spend too much time fixing data instead of using it. Models make data trustworthy and ready for decision-making.

Where it fits

Before learning about models, you should understand basic SQL and the concept of data transformation. After mastering models, you can explore advanced dbt features like tests, snapshots, and macros that build on models to improve data quality and automation.

Mental Model

Core Idea

Models are like recipes that transform raw ingredients (data) into a finished dish (clean tables) that everyone can enjoy and trust.

Think of it like...

Imagine you have a basket of raw vegetables (raw data). A model is like a cooking recipe that tells you how to wash, chop, and cook these vegetables to make a tasty meal (organized data) ready to serve.

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ Raw Source  │ --> │   Model     │ --> │ Clean Table │
│   Data      │     │ (SQL Logic) │     │ (Output)    │
└─────────────┘     └─────────────┘     └─────────────┘

Build-Up - 7 Steps

FoundationUnderstanding raw data sources

Concept: Raw data is the starting point before any transformation.

Raw data comes from databases, files, or APIs and is often messy or unorganized. It might have duplicates, missing values, or inconsistent formats. Before using this data for analysis, it needs cleaning and structuring.

Result

You recognize that raw data is not ready for direct use and needs transformation.

Understanding raw data's imperfections explains why transformation is necessary.

FoundationWhat is a dbt model?

IntermediateHow models connect and build pipelines

IntermediateMaterializations: tables vs views in models

IntermediateUsing Jinja templating inside models

AdvancedIncremental models for large datasets

ExpertModel dependency graph and execution engine

Under the Hood

dbt reads all model SQL files and parses the {{ ref() }} calls to build a dependency graph. This graph is a map of which models depend on others. When you run dbt, it uses this graph to decide the order of execution. Models are compiled into raw SQL with Jinja templating resolved, then sent to the data warehouse to create tables or views. The warehouse executes the SQL and stores results. dbt tracks metadata to know which models are fresh or need rebuilding.

Why designed this way?

dbt was designed to bring software engineering best practices to data transformation. Using models as modular SQL files with explicit dependencies makes pipelines transparent and maintainable. The DAG approach prevents errors from running models in the wrong order. Jinja templating adds flexibility without losing simplicity. This design balances power, clarity, and ease of use, unlike older monolithic ETL tools.

┌─────────────┐      ┌───────────────┐      ┌───────────────┐
│ Model Files │ ---> │ Dependency    │ ---> │ SQL Compiled  │
│ (SQL + Jinja)│      │ Graph (DAG)   │      │ & Executed in │
└─────────────┘      └───────────────┘      │ Data Warehouse│
                                              └───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Do models in dbt only create tables? Commit to yes or no.

Common Belief:Models always create physical tables in the database.

Tap to reveal reality

Quick: Do you think models run independently without order? Commit to yes or no.

Common Belief:Models run independently and can be executed in any order.

Tap to reveal reality

Quick: Can you write any SQL in a model without restrictions? Commit to yes or no.

Common Belief:Models are just static SQL files without dynamic capabilities.

Tap to reveal reality

Quick: Do incremental models rebuild entire tables every run? Commit to yes or no.

Common Belief:All models rebuild the full dataset every time they run.

Tap to reveal reality

Expert Zone

Model performance depends heavily on how SQL is written and how the warehouse optimizes queries, not just on dbt settings.

The dependency graph can become complex in large projects, requiring careful management to avoid circular dependencies.

Materializations can be customized beyond tables and views, including ephemeral models that never create physical objects but inline SQL.

When NOT to use

Models are not suitable for real-time or streaming data transformations; specialized tools like Apache Kafka or Spark Structured Streaming are better. Also, for very simple data tasks, direct SQL queries without dbt might be faster to prototype.

Production Patterns

In production, teams use models with version control, automated testing, and CI/CD pipelines. Incremental models handle large datasets efficiently. Teams also modularize models into layers (staging, intermediate, marts) for clarity and reuse.

Connections

Software Engineering Modularization

Models in dbt are like modules or functions in programming that encapsulate logic and can be composed.

Understanding modularization in software helps grasp why breaking transformations into models improves maintainability and collaboration.

Directed Acyclic Graphs (DAGs) in Project Management

The model dependency graph in dbt is a DAG, similar to task dependencies in project planning.

Knowing DAGs from project management clarifies how dbt schedules model runs to respect dependencies and avoid conflicts.

Cooking Recipes

Models are like recipes that transform raw ingredients into finished dishes.

This analogy helps beginners relate data transformation to everyday experiences of following step-by-step instructions.

Common Pitfalls

#1Running models without understanding dependencies causes errors.

Wrong approach:dbt run --models model_b model_a

Correct approach:dbt run --models model_a model_b

Root cause:Not recognizing that model_b depends on model_a leads to running them in the wrong order.

#2Using full table rebuilds for very large datasets wastes time and resources.

Wrong approach:materialized='table' without incremental logic on huge tables

Correct approach:materialized='incremental' with unique keys and filters

Root cause:Ignoring incremental models causes unnecessary full data processing.

#3Writing repetitive SQL without templating increases errors and maintenance.

Wrong approach:Copy-pasting similar SQL code across multiple models

Correct approach:Using Jinja macros and variables to reuse code

Root cause:Not leveraging dbt's templating features leads to duplicated effort.

Key Takeaways

Models are the core of dbt because they define how raw data is transformed into clean, usable tables or views.

They are simple SQL files enhanced with templating and dependency management to build reliable data pipelines.

Understanding model dependencies and materializations is key to efficient and correct data workflows.

Advanced features like incremental models and Jinja templating make dbt powerful for large-scale production use.

Mastering models unlocks the full potential of dbt for trustworthy, maintainable, and scalable data transformation.

Practice

(1/5)

1. What is the main role of models in dbt?

easy

A. To transform raw data into useful tables or views

B. To store raw data without changes

C. To create visual dashboards

D. To manage user permissions

Why models are the core of dbt - Why It Works This Way

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of models in dbt

Step 2: Identify the correct role from options

Final Answer:

Quick Check:

Solution

Step 1: Recall dbt model file requirements

Step 2: Match file type and content

Final Answer:

Quick Check:

Solution

Step 1: Analyze the SQL query in the model

Step 2: Determine the output of the model

Final Answer:

Quick Check:

Solution

Step 1: Check SELECT and GROUP BY columns

Step 2: Identify mismatch causing error

Final Answer:

Quick Check:

Solution

Step 1: Identify how models transform data in dbt

Step 2: Choose the option that uses dbt models correctly

Final Answer:

Quick Check: