dbtdata~15 mins

Creating your first model in dbt - Mechanics & Internals

Choose your learning style10 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Overview - Creating your first model

What is it?

Creating your first model in dbt means writing a SQL query that transforms raw data into a clean, organized table or view. This model is a building block for your data analysis and reporting. It helps you shape data so it is easier to understand and use. Models in dbt are reusable and can depend on each other to build complex data pipelines.

Why it matters

Without models, raw data stays messy and hard to analyze, making decisions slow and error-prone. Models let you automate data cleaning and transformation, saving time and reducing mistakes. They create a clear, trusted source of data for everyone in your team. This makes your data work faster, more reliable, and easier to maintain.

Where it fits

Before creating your first model, you should understand basic SQL and have dbt installed with a connection to your data warehouse. After learning to create models, you will explore testing, documentation, and building complex data pipelines with multiple models.

Mental Model

Core Idea

A dbt model is a SQL query saved as a file that creates a table or view in your data warehouse, transforming raw data into clean, usable data.

Think of it like...

Creating a dbt model is like writing a recipe that turns raw ingredients into a finished dish. The recipe (SQL query) tells the kitchen (data warehouse) how to prepare the meal (clean data) step by step.

┌───────────────┐
│ Raw Data Table│
└──────┬────────┘
       │
       ▼
┌─────────────────────┐
│ dbt Model SQL Query  │
│ (Transformation)    │
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│ Clean Data Table or  │
│ View in Warehouse   │
└─────────────────────┘

Build-Up - 6 Steps

FoundationUnderstanding dbt Models Basics

Concept: Learn what a dbt model is and how it relates to SQL queries and tables.

A dbt model is simply a SQL file saved in your project under the 'models' folder. When you run dbt, it runs these SQL queries and creates tables or views in your data warehouse. Each model file corresponds to one table or view. Models help organize your data transformations in a clear, repeatable way.

Result

You understand that a dbt model is a SQL file that creates a table or view when run.

Knowing that models are just SQL files helps you see dbt as a tool that organizes and runs your SQL transformations automatically.

FoundationSetting Up Your dbt Project

IntermediateWriting Your First Model SQL Query

IntermediateRunning Your Model with dbt

AdvancedUsing Model Configurations

ExpertUnderstanding Model Dependencies and DAG

Under the Hood

dbt reads your SQL model files and compiles them into raw SQL queries tailored for your data warehouse dialect. It analyzes the SQL to detect references to other models, building a dependency graph. When you run dbt, it executes these queries in dependency order, creating tables or views in the warehouse. dbt also manages schema changes and incremental updates if configured.

Why designed this way?

dbt was designed to bring software engineering best practices to data transformation. By treating SQL queries as code and managing dependencies automatically, it solves the problem of messy, unorganized data pipelines. Alternatives like manual SQL scripts or ETL tools lacked version control, testing, and modularity, which dbt provides.

┌───────────────┐
│ Model SQL File│
└──────┬────────┘
       │
       ▼
┌─────────────────────┐
│ dbt Compiler        │
│ - Parses SQL        │
│ - Detects refs      │
│ - Builds DAG        │
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│ Data Warehouse      │
│ - Runs SQL Queries  │
│ - Creates Tables/Views│
└─────────────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does dbt automatically create tables for models by default? Commit yes or no.

Common Belief:dbt always creates tables when you run a model.

Tap to reveal reality

Quick: Do you think dbt models can only select from raw tables, not other models? Commit your answer.

Common Belief:Models can only use raw tables as data sources, not other models.

Tap to reveal reality

Quick: Does running 'dbt run' execute models in the order they appear in the folder? Commit yes or no.

Common Belief:dbt runs models in alphabetical or folder order.

Tap to reveal reality

Quick: Can you edit models directly in the data warehouse after dbt runs? Commit yes or no.

Common Belief:You can safely edit tables created by dbt directly in the warehouse.

Tap to reveal reality

Expert Zone

dbt's dependency graph is built by parsing SQL references, so using dynamic SQL or macros can affect dependency detection and require explicit declarations.

Materializations like incremental models allow partial data updates, improving performance but requiring careful handling of unique keys and update logic.

dbt supports hooks and operations to run custom SQL before or after models, enabling advanced workflows like permissions or data validations.

When NOT to use

dbt models are not ideal for real-time streaming data transformations or complex procedural logic better handled by dedicated ETL tools or data processing frameworks like Apache Spark.

Production Patterns

In production, teams organize models into folders by domain, use tests to validate data quality, schedule dbt runs with orchestration tools, and use documentation features to maintain clear data lineage.

Connections

Software Engineering CI/CD

dbt applies software engineering practices like version control and automated runs to data transformations.

Understanding CI/CD helps grasp how dbt automates testing and deployment of data models, improving reliability.

Data Warehousing

dbt models transform raw data inside a data warehouse to create clean, analysis-ready tables.

Knowing data warehouse concepts clarifies why dbt models focus on SQL transformations and materializations.

Recipe Writing in Cooking

Both involve step-by-step instructions transforming raw ingredients into a finished product.

This cross-domain view highlights the importance of clear, repeatable instructions for consistent results.

Common Pitfalls

#1Running dbt without setting up the data warehouse connection.

Wrong approach:dbt run

Correct approach:Edit profiles.yml with correct connection details before running 'dbt run'.

Root cause:Not understanding that dbt needs connection info to execute SQL in the warehouse.

#2Writing a model SQL that selects from a non-existent table or model.

Wrong approach:SELECT * FROM raw.non_existing_table

Correct approach:SELECT * FROM raw.existing_table

Root cause:Not verifying source tables or models exist before referencing them.

#3Editing dbt-created tables directly in the warehouse after running models.

Wrong approach:UPDATE analytics.my_first_model SET column = 'value' WHERE id = 1;

Correct approach:Make changes by editing the model SQL and re-running dbt.

Root cause:Misunderstanding that dbt manages tables and overwrites them on runs.

Key Takeaways

A dbt model is a SQL file that transforms raw data into clean tables or views in your data warehouse.

Models are organized, reusable, and run automatically by dbt, saving time and reducing errors.

Understanding model dependencies lets you build complex pipelines that run in the correct order.

Configuring materializations affects how data is stored and refreshed, impacting performance.

dbt brings software engineering best practices to data transformation, making data pipelines reliable and maintainable.

Practice

(1/5)

1. What is the main purpose of a dbt model?

easy

A. To write Python scripts for data analysis

B. To store raw data without changes

C. To create visual dashboards

D. To transform raw data into clean, usable tables

Creating your first model in dbt - Mechanics & Internals

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of dbt models

Step 2: Compare options with this role

Final Answer:

Quick Check:

Solution

Step 1: Recall dbt model syntax

Step 2: Evaluate each option

Final Answer:

Quick Check:

Solution

Step 1: Understand what dbt run does

Step 2: Analyze the model SQL

Final Answer:

Quick Check:

Solution

Step 1: Recall GROUP BY rules

Step 2: Analyze the SELECT and GROUP BY columns

Final Answer:

Quick Check:

Solution

Step 1: Understand filtering on aggregated values

Step 2: Analyze each option

Final Answer:

Quick Check: