0
0
dbtdata~15 mins

Why models are the core of dbt - Why It Works This Way

Choose your learning style9 modes available
Overview - Why models are the core of dbt
What is it?
In dbt, models are SQL files that define how raw data is transformed into clean, organized tables or views. They are the main building blocks where you write the logic to shape your data. Models connect your source data to the final datasets used for analysis. Essentially, models are the heart of dbt projects because they control what data looks like and how it flows.
Why it matters
Without models, dbt would have no way to transform raw data into useful insights. Models solve the problem of messy, unorganized data by providing a clear, repeatable way to clean and structure it. Without this, analysts and data teams would spend too much time fixing data instead of using it. Models make data trustworthy and ready for decision-making.
Where it fits
Before learning about models, you should understand basic SQL and the concept of data transformation. After mastering models, you can explore advanced dbt features like tests, snapshots, and macros that build on models to improve data quality and automation.
Mental Model
Core Idea
Models are like recipes that transform raw ingredients (data) into a finished dish (clean tables) that everyone can enjoy and trust.
Think of it like...
Imagine you have a basket of raw vegetables (raw data). A model is like a cooking recipe that tells you how to wash, chop, and cook these vegetables to make a tasty meal (organized data) ready to serve.
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ Raw Source  │ --> │   Model     │ --> │ Clean Table │
│   Data      │     │ (SQL Logic) │     │ (Output)    │
└─────────────┘     └─────────────┘     └─────────────┘
Build-Up - 7 Steps
1
FoundationUnderstanding raw data sources
🤔
Concept: Raw data is the starting point before any transformation.
Raw data comes from databases, files, or APIs and is often messy or unorganized. It might have duplicates, missing values, or inconsistent formats. Before using this data for analysis, it needs cleaning and structuring.
Result
You recognize that raw data is not ready for direct use and needs transformation.
Understanding raw data's imperfections explains why transformation is necessary.
2
FoundationWhat is a dbt model?
🤔
Concept: A dbt model is a SQL file that defines a transformation step.
In dbt, you write SQL SELECT statements inside model files. Each model creates a table or view in your data warehouse. Models are organized in folders and run in order based on dependencies.
Result
You can create a simple model that selects and filters data from a raw table.
Knowing that models are just SQL files helps demystify dbt and shows how it builds on familiar skills.
3
IntermediateHow models connect and build pipelines
🤔Before reading on: do you think models run independently or depend on each other? Commit to your answer.
Concept: Models can depend on other models, creating a chain of transformations.
You can reference one model inside another using the {{ ref() }} function. This creates a dependency graph where dbt runs models in the right order. This lets you build complex pipelines step-by-step.
Result
You see how multiple models combine to transform raw data into final tables.
Understanding model dependencies reveals how dbt manages complex workflows automatically.
4
IntermediateMaterializations: tables vs views in models
🤔Before reading on: do you think models always create tables or can they create views? Commit to your answer.
Concept: Models can create either tables or views depending on materialization settings.
By default, models create views, which are virtual tables that run the SQL each time queried. You can change materialization to 'table' to store results physically. This affects performance and storage.
Result
You understand how to control model output type for efficiency and cost.
Knowing materializations helps optimize data workflows and resource use.
5
IntermediateUsing Jinja templating inside models
🤔Before reading on: do you think models can include dynamic logic or are they static SQL? Commit to your answer.
Concept: dbt models support Jinja templating to add dynamic SQL generation.
You can use Jinja syntax like loops, conditions, and variables inside models. This lets you write reusable and flexible SQL code that adapts to different environments or inputs.
Result
You can create models that adjust logic without rewriting SQL manually.
Understanding templating unlocks powerful automation and reduces errors.
6
AdvancedIncremental models for large datasets
🤔Before reading on: do you think models always rebuild entire tables or can they update only new data? Commit to your answer.
Concept: Incremental models update only new or changed data instead of full rebuilds.
For very large tables, rebuilding everything is slow and costly. Incremental models use logic to add or update only recent rows. You define unique keys and filters to control this behavior.
Result
You can build efficient models that scale with data size.
Knowing incremental models is key to handling big data in production.
7
ExpertModel dependency graph and execution engine
🤔Before reading on: do you think dbt runs models randomly or follows a plan? Commit to your answer.
Concept: dbt builds a dependency graph to run models in the correct order automatically.
dbt parses all models and their references to create a directed acyclic graph (DAG). It then executes models respecting dependencies, parallelizing where possible. This ensures data consistency and efficient runs.
Result
You understand how dbt orchestrates complex transformations reliably.
Understanding the DAG and execution engine explains why dbt is powerful and reliable for data pipelines.
Under the Hood
dbt reads all model SQL files and parses the {{ ref() }} calls to build a dependency graph. This graph is a map of which models depend on others. When you run dbt, it uses this graph to decide the order of execution. Models are compiled into raw SQL with Jinja templating resolved, then sent to the data warehouse to create tables or views. The warehouse executes the SQL and stores results. dbt tracks metadata to know which models are fresh or need rebuilding.
Why designed this way?
dbt was designed to bring software engineering best practices to data transformation. Using models as modular SQL files with explicit dependencies makes pipelines transparent and maintainable. The DAG approach prevents errors from running models in the wrong order. Jinja templating adds flexibility without losing simplicity. This design balances power, clarity, and ease of use, unlike older monolithic ETL tools.
┌─────────────┐      ┌───────────────┐      ┌───────────────┐
│ Model Files │ ---> │ Dependency    │ ---> │ SQL Compiled  │
│ (SQL + Jinja)│      │ Graph (DAG)   │      │ & Executed in │
└─────────────┘      └───────────────┘      │ Data Warehouse│
                                              └───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do models in dbt only create tables? Commit to yes or no.
Common Belief:Models always create physical tables in the database.
Tap to reveal reality
Reality:Models can create either tables or views depending on materialization settings.
Why it matters:Assuming models always create tables can lead to inefficient storage use and slower development cycles.
Quick: Do you think models run independently without order? Commit to yes or no.
Common Belief:Models run independently and can be executed in any order.
Tap to reveal reality
Reality:Models have dependencies and dbt runs them in a specific order based on the dependency graph.
Why it matters:Ignoring dependencies can cause errors or inconsistent data if models run before their inputs are ready.
Quick: Can you write any SQL in a model without restrictions? Commit to yes or no.
Common Belief:Models are just static SQL files without dynamic capabilities.
Tap to reveal reality
Reality:Models support Jinja templating to add dynamic logic and reuse.
Why it matters:Not using templating limits flexibility and leads to repetitive, error-prone code.
Quick: Do incremental models rebuild entire tables every run? Commit to yes or no.
Common Belief:All models rebuild the full dataset every time they run.
Tap to reveal reality
Reality:Incremental models update only new or changed data to save time and resources.
Why it matters:Not using incremental models on large data can cause slow runs and high costs.
Expert Zone
1
Model performance depends heavily on how SQL is written and how the warehouse optimizes queries, not just on dbt settings.
2
The dependency graph can become complex in large projects, requiring careful management to avoid circular dependencies.
3
Materializations can be customized beyond tables and views, including ephemeral models that never create physical objects but inline SQL.
When NOT to use
Models are not suitable for real-time or streaming data transformations; specialized tools like Apache Kafka or Spark Structured Streaming are better. Also, for very simple data tasks, direct SQL queries without dbt might be faster to prototype.
Production Patterns
In production, teams use models with version control, automated testing, and CI/CD pipelines. Incremental models handle large datasets efficiently. Teams also modularize models into layers (staging, intermediate, marts) for clarity and reuse.
Connections
Software Engineering Modularization
Models in dbt are like modules or functions in programming that encapsulate logic and can be composed.
Understanding modularization in software helps grasp why breaking transformations into models improves maintainability and collaboration.
Directed Acyclic Graphs (DAGs) in Project Management
The model dependency graph in dbt is a DAG, similar to task dependencies in project planning.
Knowing DAGs from project management clarifies how dbt schedules model runs to respect dependencies and avoid conflicts.
Cooking Recipes
Models are like recipes that transform raw ingredients into finished dishes.
This analogy helps beginners relate data transformation to everyday experiences of following step-by-step instructions.
Common Pitfalls
#1Running models without understanding dependencies causes errors.
Wrong approach:dbt run --models model_b model_a
Correct approach:dbt run --models model_a model_b
Root cause:Not recognizing that model_b depends on model_a leads to running them in the wrong order.
#2Using full table rebuilds for very large datasets wastes time and resources.
Wrong approach:materialized='table' without incremental logic on huge tables
Correct approach:materialized='incremental' with unique keys and filters
Root cause:Ignoring incremental models causes unnecessary full data processing.
#3Writing repetitive SQL without templating increases errors and maintenance.
Wrong approach:Copy-pasting similar SQL code across multiple models
Correct approach:Using Jinja macros and variables to reuse code
Root cause:Not leveraging dbt's templating features leads to duplicated effort.
Key Takeaways
Models are the core of dbt because they define how raw data is transformed into clean, usable tables or views.
They are simple SQL files enhanced with templating and dependency management to build reliable data pipelines.
Understanding model dependencies and materializations is key to efficient and correct data workflows.
Advanced features like incremental models and Jinja templating make dbt powerful for large-scale production use.
Mastering models unlocks the full potential of dbt for trustworthy, maintainable, and scalable data transformation.