Bird
Raised Fist0
dbtdata~10 mins

Creating your first model in dbt - Visual Walkthrough

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Concept Flow - Creating your first model
Write SQL SELECT query
Save as .sql file in models folder
Run dbt run command
dbt compiles and runs SQL
Model table/view created in target database
Check model output with dbt docs or SQL query
This flow shows how you write a SQL query, save it as a model file, run dbt to build it, and then check the resulting table or view.
Execution Sample
dbt
select
  customer_id,
  count(*) as order_count
from raw.orders
group by customer_id
This SQL query counts orders per customer from the raw.orders table, which will be saved as a dbt model.
Execution Table
StepActionInput/CodeResult/Output
1Write SQL queryselect customer_id, count(*) as order_count from raw.orders group by customer_idSQL query ready in model file
2Save filemodels/customer_order_counts.sqlModel file saved
3Run dbtdbt rundbt compiles SQL and runs it
4dbt compilesSQL with Jinja macros (if any)Final SQL sent to database
5Database executesFinal SQLTable or view created with customer_id and order_count
6Check outputselect * from customer_order_counts limit 5Sample rows of aggregated order counts
7ExitNo more stepsModel created successfully
💡 Model created and data aggregated by customer_id, ready for analysis
Variable Tracker
VariableStartAfter Step 1After Step 3Final
SQL Queryemptyselect customer_id, count(*) as order_count from raw.orders group by customer_idcompiled and run by dbttable/view created in database
Model Filenonemodels/customer_order_counts.sql createdused by dbt runexists in models folder
Database Table/Viewnonenonecreated by dbt runcustomer_order_counts with aggregated data
Key Moments - 3 Insights
Why do we save the SQL query as a .sql file in the models folder?
dbt looks for SQL files in the models folder to know which queries to run and build as models, as shown in execution_table step 2.
What happens when we run 'dbt run'?
dbt compiles the SQL files, applies any macros, and runs the final SQL on the database to create tables or views, as seen in execution_table steps 3 to 5.
How do we check if the model was created correctly?
We can query the new table or view in the database or use dbt docs to inspect it, as shown in execution_table step 6.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the result after step 5?
AThe SQL query is saved as a model file
BThe database creates a table or view with aggregated data
Cdbt run command is executed
DThe model file is deleted
💡 Hint
Check the 'Result/Output' column in row for step 5 in execution_table
At which step does dbt compile the SQL query?
AStep 1
BStep 2
CStep 4
DStep 6
💡 Hint
Look at the 'Action' and 'Result/Output' columns for step 4 in execution_table
If you change the SQL query, which step must you repeat to update the model?
ASteps 1 and 3
BStep 3 only
CStep 1 only
DStep 6
💡 Hint
Refer to execution_table steps 1 and 3 where query is written and dbt run is executed
Concept Snapshot
Creating your first dbt model:
1. Write a SQL SELECT query to define your data transformation.
2. Save it as a .sql file in the models folder.
3. Run 'dbt run' to compile and execute the SQL.
4. dbt creates a table or view in your database.
5. Query or document the model to verify output.
Full Transcript
To create your first model in dbt, you start by writing a SQL query that selects and transforms data. You save this query as a .sql file inside the models folder of your dbt project. Then, you run the command 'dbt run' which compiles your SQL, applies any macros, and runs it on your database. This process creates a new table or view representing your model. Finally, you can check the model's output by querying the new table or using dbt's documentation tools. This step-by-step process helps you build reusable, tested data transformations easily.

Practice

(1/5)
1. What is the main purpose of a dbt model?
easy
A. To write Python scripts for data analysis
B. To store raw data without changes
C. To create visual dashboards
D. To transform raw data into clean, usable tables

Solution

  1. Step 1: Understand the role of dbt models

    dbt models are SQL files that transform raw data into clean tables for analysis.
  2. Step 2: Compare options with this role

    Only To transform raw data into clean, usable tables describes transforming raw data into clean tables, which matches the purpose of dbt models.
  3. Final Answer:

    To transform raw data into clean, usable tables -> Option D
  4. Quick Check:

    dbt model purpose = transform raw data [OK]
Hint: Remember: dbt models clean and transform data [OK]
Common Mistakes:
  • Confusing models with dashboards
  • Thinking models store raw data unchanged
  • Assuming models are Python scripts
2. Which of the following is the correct way to define a simple dbt model SQL file?
easy
A. SELECT * FROM raw_data
B. CREATE MODEL my_model AS SELECT * FROM raw_data
C. dbt run SELECT * FROM raw_data
D. INSERT INTO model SELECT * FROM raw_data

Solution

  1. Step 1: Recall dbt model syntax

    A dbt model is a SQL SELECT statement saved as a .sql file in the models folder.
  2. Step 2: Evaluate each option

    SELECT * FROM raw_data is a simple SELECT statement, which is the correct way to define a model. Options B, C, and D use incorrect syntax or commands not used in dbt model files.
  3. Final Answer:

    SELECT * FROM raw_data -> Option A
  4. Quick Check:

    dbt model = simple SELECT statement [OK]
Hint: dbt models are just SELECT queries saved as files [OK]
Common Mistakes:
  • Using CREATE MODEL syntax (not valid in dbt)
  • Trying to run dbt commands inside SQL files
  • Using INSERT statements instead of SELECT
3. Given the following dbt model SQL code saved as models/my_first_model.sql:
SELECT id, name FROM raw_customers WHERE active = true
What will be the output when you run dbt run?
medium
A. Nothing happens because dbt run does not create models
B. An error because of missing CREATE TABLE statement
C. A new table or view named my_first_model with active customers only
D. The raw_customers table will be deleted

Solution

  1. Step 1: Understand what dbt run does

    Running dbt run executes model SQL files and creates tables or views with the model name.
  2. Step 2: Analyze the model SQL

    The model selects id and name from raw_customers where active is true, so the output table will contain only active customers.
  3. Final Answer:

    A new table or view named my_first_model with active customers only -> Option C
  4. Quick Check:

    dbt run creates model tables = filtered active customers [OK]
Hint: dbt run creates tables from your SELECT queries [OK]
Common Mistakes:
  • Expecting CREATE TABLE in model SQL
  • Thinking dbt deletes source tables
  • Believing dbt run does nothing
4. You wrote this dbt model SQL file named models/customer_summary.sql:
SELECT customer_id, order_id, COUNT(*) AS orders_count
FROM orders
GROUP BY customer_id
When you run dbt run, you get an error. What is the most likely cause?
medium
A. Missing a semicolon at the end of the SQL statement
B. The GROUP BY column does not match the SELECT columns
C. The SELECT statement is missing a FROM clause
D. The model file is not saved in the models folder

Solution

  1. Step 1: Recall GROUP BY rules

    When using GROUP BY, all non-aggregated columns in SELECT must either be aggregated or included in GROUP BY.
  2. Step 2: Analyze the SELECT and GROUP BY columns

    SELECT has customer_id (in GROUP BY), order_id (neither aggregated nor grouped), COUNT(*) (aggregated). Thus, GROUP BY does not match SELECT columns.
  3. Final Answer:

    The GROUP BY column does not match the SELECT columns -> Option B
  4. Quick Check:

    GROUP BY must include all non-aggregated SELECT columns [OK]
Hint: Ensure all non-aggregated SELECT columns are in GROUP BY [OK]
Common Mistakes:
  • Forgetting to include non-aggregated columns in GROUP BY
  • Assuming semicolon is required
  • Saving model outside models folder
5. You want to create a dbt model that shows the total sales per product category, but only for categories with total sales above 1000. Which SQL code correctly implements this in your model file?
hard
A. SELECT category, SUM(sales) AS total_sales FROM sales_data GROUP BY category HAVING SUM(sales) > 1000
B. SELECT category, SUM(sales) AS total_sales FROM sales_data WHERE total_sales > 1000 GROUP BY category
C. SELECT category, SUM(sales) AS total_sales FROM sales_data GROUP BY category HAVING total_sales > 1000
D. SELECT category, SUM(sales) AS total_sales FROM sales_data WHERE SUM(sales) > 1000 GROUP BY category

Solution

  1. Step 1: Understand filtering on aggregated values

    To filter groups by aggregated values, use HAVING with the aggregate function, not WHERE.
  2. Step 2: Analyze each option

    SELECT category, SUM(sales) AS total_sales FROM sales_data GROUP BY category HAVING total_sales > 1000 uses HAVING total_sales > 1000, but total_sales is an alias and cannot be used directly in HAVING in many SQL dialects. SELECT category, SUM(sales) AS total_sales FROM sales_data GROUP BY category HAVING SUM(sales) > 1000 uses HAVING SUM(sales) > 1000, which is correct. Options B and D incorrectly use WHERE with aggregate functions, which is invalid.
  3. Final Answer:

    SELECT category, SUM(sales) AS total_sales FROM sales_data GROUP BY category HAVING SUM(sales) > 1000 -> Option A
  4. Quick Check:

    Use HAVING with aggregate functions to filter groups [OK]
Hint: Use HAVING with aggregate functions, not WHERE [OK]
Common Mistakes:
  • Using WHERE to filter aggregated results
  • Using alias names in HAVING clause
  • Forgetting GROUP BY when aggregating