0
0
dbtdata~30 mins

Query profiling and optimization in dbt - Mini Project: Build & Apply

Choose your learning style9 modes available
Query Profiling and Optimization with dbt
📖 Scenario: You work as a data analyst for an online store. Your team uses dbt to manage SQL queries that prepare sales data for reporting. Some queries run slowly, and you want to profile and optimize them to improve performance.
🎯 Goal: You will create a dbt model with sales data, add a configuration to limit data for faster testing, optimize the SQL query by filtering and aggregating efficiently, and finally run the model to see the optimized output.
📋 What You'll Learn
Create a dbt model SQL file with a SELECT query from the raw_sales table
Add a config variable limit_rows to limit rows for profiling
Optimize the query by filtering sales with amount greater than 100 and grouping by product
Print the final aggregated sales per product
💡 Why This Matters
🌍 Real World
Data analysts and engineers often need to profile and optimize SQL queries in dbt to improve data pipeline speed and efficiency.
💼 Career
Skills in query optimization and dbt modeling are valuable for roles like data analyst, data engineer, and analytics engineer.
Progress0 / 4 steps
1
Create initial dbt model with raw sales data
Create a dbt model SQL file named sales_data.sql with a SELECT query that selects order_id, product_id, and amount from the table raw_sales.
dbt
Need a hint?

Use a simple SELECT statement to get the three columns from raw_sales.

2
Add a config variable to limit rows for profiling
Add a config variable named limit_rows and set it to 1000. Modify the SELECT query to limit the number of rows returned to limit_rows using a WHERE clause with row_number() over().
dbt
Need a hint?

Use a subquery with row_number() over () to assign row numbers and filter with WHERE rn <= limit_rows.

3
Optimize query by filtering and grouping sales
Modify the query to select only sales where amount > 100. Group the results by product_id and calculate the total sales amount as total_amount using SUM(amount). Keep the row limiting logic intact.
dbt
Need a hint?

Filter rows with amount > 100 inside the subquery, then group by product_id and sum the amounts.

4
Run the model and print the optimized sales output
Run the dbt model and print the resulting aggregated sales per product. Use print() to display the final query result as a table or list.
dbt
Need a hint?

Print the header and some example rows to show the aggregated sales per product.