Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Why dbt Transformed Data Transformation Workflows
📖 Scenario: Imagine you work in a company where data is collected from many sources. Before you can analyze it, you need to clean and organize it. This process is called data transformation. Traditionally, this was slow and hard to manage. Then, a tool called dbt changed how teams transform data.
🎯 Goal: You will create a simple example to understand how dbt helps transform data step-by-step, making workflows easier and clearer.
📋 What You'll Learn
Create a dictionary with raw data representing sales
Add a configuration variable to set a sales threshold
Use a comprehension to filter sales above the threshold
Print the filtered sales data
💡 Why This Matters
🌍 Real World
Companies collect data from many places. They need to clean and organize it before making decisions. dbt helps teams write clear, reusable steps to transform data quickly and safely.
💼 Career
Data analysts and engineers use dbt to build reliable data pipelines. Understanding how to filter and transform data is a key skill for these roles.
Progress0 / 4 steps
1
Create raw sales data dictionary
Create a dictionary called raw_sales with these exact entries: 'store1': 100, 'store2': 250, 'store3': 75, 'store4': 300
dbt
Hint
Use curly braces to create a dictionary and separate each store and sales value with a colon.
2
Set sales threshold configuration
Create a variable called sales_threshold and set it to 150
dbt
Hint
Just assign the number 150 to the variable sales_threshold.
3
Filter sales above threshold using comprehension
Create a new dictionary called filtered_sales using a dictionary comprehension that includes only items from raw_sales where the sales value is greater than sales_threshold
dbt
Hint
Use a dictionary comprehension with for store, sales in raw_sales.items() and an if condition.
4
Print the filtered sales data
Write a print statement to display the filtered_sales dictionary
dbt
Hint
Use print(filtered_sales) to show the result.
Practice
(1/5)
1. What is one main reason dbt changed how data transformation workflows are done?
easy
A. It breaks complex data tasks into smaller, clear steps called models.
B. It replaces SQL with a new programming language.
C. It removes the need for testing data transformations.
D. It stores data in a new type of database automatically.
Solution
Step 1: Understand dbt's approach to data workflows
dbt organizes data transformations into small, manageable pieces called models, making workflows clearer.
Step 2: Compare options to dbt's features
Only It breaks complex data tasks into smaller, clear steps called models. correctly describes this key feature; others are incorrect or unrelated.
Final Answer:
It breaks complex data tasks into smaller, clear steps called models. -> Option A
Quick Check:
dbt uses models to simplify workflows = B [OK]
Hint: Remember: dbt splits work into models for clarity [OK]
Common Mistakes:
Thinking dbt replaces SQL
Believing dbt removes testing
Assuming dbt changes database types
2. Which of the following is the correct way to define a model in dbt using SQL?
easy
A. dbt run my_model;
B. CREATE MODEL my_model AS SELECT * FROM source_table;
C. SELECT * FROM source_table;
D. DEFINE MODEL my_model SELECT * FROM source_table;
Solution
Step 1: Recall dbt model definition syntax
In dbt, a model is defined simply by writing a SQL SELECT statement in a .sql file.
Step 2: Evaluate each option
SELECT * FROM source_table; is just a SELECT statement, which is the correct way. The other options use incorrect syntax such as CREATE MODEL, dbt run command, or DEFINE MODEL.
Final Answer:
SELECT * FROM source_table; -> Option C
Quick Check:
dbt models are SQL SELECT queries = A [OK]
Hint: dbt models are just SELECT queries saved as files [OK]
Common Mistakes:
Trying to use CREATE MODEL syntax
Using dbt commands inside SQL files
Adding extra keywords like DEFINE
3. Given this dbt model SQL code:
SELECT customer_id, COUNT(*) AS order_count FROM orders GROUP BY customer_id
What will be the output of this model?
medium
A. A table with each customer_id and their total number of orders.
B. A list of all orders without grouping.
C. An error because COUNT(*) cannot be used with GROUP BY.
D. A table with order_count but no customer_id.
Solution
Step 1: Analyze the SQL query
The query groups orders by customer_id and counts orders per customer.
Step 2: Determine the output structure
The output will have two columns: customer_id and order_count, showing total orders per customer.
Final Answer:
A table with each customer_id and their total number of orders. -> Option A
Quick Check:
GROUP BY customer_id with COUNT(*) = grouped counts [OK]
Hint: GROUP BY + COUNT(*) gives counts per group [OK]
Common Mistakes:
Thinking COUNT(*) can't be used with GROUP BY
Expecting ungrouped list
Missing customer_id in output
4. You wrote this dbt model SQL:
SELECT user_id, SUM(amount) AS total FROM sales
When running dbt, you get an error. What is the likely cause?
medium
A. dbt requires CREATE TABLE statements in models.
B. Missing GROUP BY clause for user_id in aggregation.
C. user_id is not a valid column name.
D. SUM(amount) cannot be used in dbt models.
Solution
Step 1: Identify the SQL error
Using SUM(amount) with user_id requires GROUP BY user_id to aggregate correctly.
Step 2: Check options against SQL rules
Missing GROUP BY clause for user_id in aggregation. correctly points out the missing GROUP BY clause causing the error.
Final Answer:
Missing GROUP BY clause for user_id in aggregation. -> Option B
Quick Check:
Aggregations need GROUP BY for non-aggregated columns [OK]
Hint: Always add GROUP BY for columns outside aggregation [OK]
Common Mistakes:
Thinking SUM() is disallowed in dbt
Assuming column names cause error without checking
Expecting CREATE TABLE in dbt models
5. You want to build a dbt model that calculates the average order value per customer but only for customers with more than 5 orders. Which SQL snippet correctly implements this in dbt?
hard
A. SELECT customer_id, AVG(order_value) AS avg_value FROM orders HAVING COUNT(*) > 5 GROUP BY customer_id
B. SELECT customer_id, AVG(order_value) AS avg_value FROM orders WHERE COUNT(*) > 5 GROUP BY customer_id
C. SELECT customer_id, AVG(order_value) AS avg_value FROM orders GROUP BY customer_id WHERE COUNT(*) > 5
D. SELECT customer_id, AVG(order_value) AS avg_value FROM orders GROUP BY customer_id HAVING COUNT(*) > 5
Solution
Step 1: Understand filtering after grouping
To filter groups by aggregate conditions, use HAVING after GROUP BY.
Step 2: Check SQL syntax correctness
SELECT customer_id, AVG(order_value) AS avg_value FROM orders GROUP BY customer_id HAVING COUNT(*) > 5 correctly places HAVING COUNT(*) > 5 after GROUP BY customer_id.
Final Answer:
SELECT customer_id, AVG(order_value) AS avg_value FROM orders GROUP BY customer_id HAVING COUNT(*) > 5 -> Option D