0
0
dbtdata~15 mins

Creating your first model in dbt - Mechanics & Internals

Choose your learning style9 modes available
Overview - Creating your first model
What is it?
Creating your first model in dbt means writing a SQL query that transforms raw data into a clean, organized table or view. This model is a building block for your data analysis and reporting. It helps you shape data so it is easier to understand and use. Models in dbt are reusable and can depend on each other to build complex data pipelines.
Why it matters
Without models, raw data stays messy and hard to analyze, making decisions slow and error-prone. Models let you automate data cleaning and transformation, saving time and reducing mistakes. They create a clear, trusted source of data for everyone in your team. This makes your data work faster, more reliable, and easier to maintain.
Where it fits
Before creating your first model, you should understand basic SQL and have dbt installed with a connection to your data warehouse. After learning to create models, you will explore testing, documentation, and building complex data pipelines with multiple models.
Mental Model
Core Idea
A dbt model is a SQL query saved as a file that creates a table or view in your data warehouse, transforming raw data into clean, usable data.
Think of it like...
Creating a dbt model is like writing a recipe that turns raw ingredients into a finished dish. The recipe (SQL query) tells the kitchen (data warehouse) how to prepare the meal (clean data) step by step.
┌───────────────┐
│ Raw Data Table│
└──────┬────────┘
       │
       ▼
┌─────────────────────┐
│ dbt Model SQL Query  │
│ (Transformation)    │
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│ Clean Data Table or  │
│ View in Warehouse   │
└─────────────────────┘
Build-Up - 6 Steps
1
FoundationUnderstanding dbt Models Basics
🤔
Concept: Learn what a dbt model is and how it relates to SQL queries and tables.
A dbt model is simply a SQL file saved in your project under the 'models' folder. When you run dbt, it runs these SQL queries and creates tables or views in your data warehouse. Each model file corresponds to one table or view. Models help organize your data transformations in a clear, repeatable way.
Result
You understand that a dbt model is a SQL file that creates a table or view when run.
Knowing that models are just SQL files helps you see dbt as a tool that organizes and runs your SQL transformations automatically.
2
FoundationSetting Up Your dbt Project
🤔
Concept: Learn how to create a new dbt project and connect it to your data warehouse.
Use the command 'dbt init my_project' to create a new project folder. Then edit the 'profiles.yml' file to add your data warehouse connection details like host, user, password, and database. This setup lets dbt know where to run your models.
Result
You have a working dbt project connected to your data warehouse, ready to run models.
Understanding the connection setup is crucial because dbt needs to know where to send your SQL queries to create tables.
3
IntermediateWriting Your First Model SQL Query
🤔Before reading on: Do you think your first model should select all columns or only some? Commit to your answer.
Concept: Learn to write a simple SQL SELECT statement as your first model.
Create a file 'models/my_first_model.sql'. Write a SQL query selecting columns from an existing raw data table, for example: SELECT id, name, created_at FROM raw.customers. This query defines what data your model will produce.
Result
A SQL file that defines a model selecting specific columns from raw data.
Choosing only needed columns helps keep your data clean and efficient, avoiding unnecessary data processing.
4
IntermediateRunning Your Model with dbt
🤔Before reading on: What do you think happens when you run 'dbt run'? Commit to your answer.
Concept: Learn how to execute your model and see the results in your data warehouse.
Run the command 'dbt run' in your project folder. dbt compiles your SQL files and runs them against your data warehouse, creating tables or views. Check your warehouse to see the new table created by your model.
Result
Your model's table or view appears in the data warehouse with the transformed data.
Seeing your model's output in the warehouse connects your SQL code to real data, reinforcing how dbt automates transformations.
5
AdvancedUsing Model Configurations
🤔Before reading on: Do you think dbt models create tables or views by default? Commit to your answer.
Concept: Learn to customize how dbt builds your model using configurations like materializations.
In your model SQL file, add a config block at the top: {{ config(materialized='view') }} or {{ config(materialized='table') }}. This tells dbt whether to create a table (stored data) or a view (virtual table). Choosing materialization affects performance and storage.
Result
Your model builds as a table or view based on your configuration.
Knowing materializations lets you optimize your data pipeline for speed, storage, and freshness.
6
ExpertUnderstanding Model Dependencies and DAG
🤔Before reading on: Do you think dbt runs models in any order or follows dependencies? Commit to your answer.
Concept: Learn how dbt manages model dependencies and runs them in the correct order using a Directed Acyclic Graph (DAG).
When one model selects from another model, dbt detects this dependency. It builds a DAG to know which models depend on others. When you run 'dbt run', it runs models in order so dependencies are ready first. This prevents errors and ensures data consistency.
Result
Models run in the correct order, respecting dependencies automatically.
Understanding the DAG helps you design complex pipelines without manual ordering, making your workflows reliable and scalable.
Under the Hood
dbt reads your SQL model files and compiles them into raw SQL queries tailored for your data warehouse dialect. It analyzes the SQL to detect references to other models, building a dependency graph. When you run dbt, it executes these queries in dependency order, creating tables or views in the warehouse. dbt also manages schema changes and incremental updates if configured.
Why designed this way?
dbt was designed to bring software engineering best practices to data transformation. By treating SQL queries as code and managing dependencies automatically, it solves the problem of messy, unorganized data pipelines. Alternatives like manual SQL scripts or ETL tools lacked version control, testing, and modularity, which dbt provides.
┌───────────────┐
│ Model SQL File│
└──────┬────────┘
       │
       ▼
┌─────────────────────┐
│ dbt Compiler        │
│ - Parses SQL        │
│ - Detects refs      │
│ - Builds DAG        │
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│ Data Warehouse      │
│ - Runs SQL Queries  │
│ - Creates Tables/Views│
└─────────────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does dbt automatically create tables for models by default? Commit yes or no.
Common Belief:dbt always creates tables when you run a model.
Tap to reveal reality
Reality:By default, dbt creates views, not tables, unless you specify materialization as 'table'.
Why it matters:Assuming tables are created can lead to performance issues or confusion about data storage and refresh behavior.
Quick: Do you think dbt models can only select from raw tables, not other models? Commit your answer.
Common Belief:Models can only use raw tables as data sources, not other models.
Tap to reveal reality
Reality:Models can select from other models, creating a chain of transformations managed by dbt.
Why it matters:Not knowing this limits your ability to build modular, layered data pipelines.
Quick: Does running 'dbt run' execute models in the order they appear in the folder? Commit yes or no.
Common Belief:dbt runs models in alphabetical or folder order.
Tap to reveal reality
Reality:dbt runs models based on dependency order, ensuring upstream models run before downstream ones.
Why it matters:Misunderstanding this can cause errors if you rely on manual ordering.
Quick: Can you edit models directly in the data warehouse after dbt runs? Commit yes or no.
Common Belief:You can safely edit tables created by dbt directly in the warehouse.
Tap to reveal reality
Reality:Editing dbt-managed tables directly can cause inconsistencies because dbt overwrites them on the next run.
Why it matters:Direct edits can be lost or cause data errors, breaking the pipeline.
Expert Zone
1
dbt's dependency graph is built by parsing SQL references, so using dynamic SQL or macros can affect dependency detection and require explicit declarations.
2
Materializations like incremental models allow partial data updates, improving performance but requiring careful handling of unique keys and update logic.
3
dbt supports hooks and operations to run custom SQL before or after models, enabling advanced workflows like permissions or data validations.
When NOT to use
dbt models are not ideal for real-time streaming data transformations or complex procedural logic better handled by dedicated ETL tools or data processing frameworks like Apache Spark.
Production Patterns
In production, teams organize models into folders by domain, use tests to validate data quality, schedule dbt runs with orchestration tools, and use documentation features to maintain clear data lineage.
Connections
Software Engineering CI/CD
dbt applies software engineering practices like version control and automated runs to data transformations.
Understanding CI/CD helps grasp how dbt automates testing and deployment of data models, improving reliability.
Data Warehousing
dbt models transform raw data inside a data warehouse to create clean, analysis-ready tables.
Knowing data warehouse concepts clarifies why dbt models focus on SQL transformations and materializations.
Recipe Writing in Cooking
Both involve step-by-step instructions transforming raw ingredients into a finished product.
This cross-domain view highlights the importance of clear, repeatable instructions for consistent results.
Common Pitfalls
#1Running dbt without setting up the data warehouse connection.
Wrong approach:dbt run
Correct approach:Edit profiles.yml with correct connection details before running 'dbt run'.
Root cause:Not understanding that dbt needs connection info to execute SQL in the warehouse.
#2Writing a model SQL that selects from a non-existent table or model.
Wrong approach:SELECT * FROM raw.non_existing_table
Correct approach:SELECT * FROM raw.existing_table
Root cause:Not verifying source tables or models exist before referencing them.
#3Editing dbt-created tables directly in the warehouse after running models.
Wrong approach:UPDATE analytics.my_first_model SET column = 'value' WHERE id = 1;
Correct approach:Make changes by editing the model SQL and re-running dbt.
Root cause:Misunderstanding that dbt manages tables and overwrites them on runs.
Key Takeaways
A dbt model is a SQL file that transforms raw data into clean tables or views in your data warehouse.
Models are organized, reusable, and run automatically by dbt, saving time and reducing errors.
Understanding model dependencies lets you build complex pipelines that run in the correct order.
Configuring materializations affects how data is stored and refreshed, impacting performance.
dbt brings software engineering best practices to data transformation, making data pipelines reliable and maintainable.