0
0
dbtdata~30 mins

Clustering and partitioning in dbt - Mini Project: Build & Apply

Choose your learning style9 modes available
Clustering and Partitioning with dbt
📖 Scenario: You work as a data analyst for an online store. The store's sales data is large and growing fast. To make queries faster and easier, you want to organize the data by clustering and partitioning it in the database using dbt.
🎯 Goal: You will create a dbt model that partitions the sales data by order_date and clusters it by customer_id. This will help speed up queries that filter by date and customer.
📋 What You'll Learn
Create a dbt model SQL file named sales_partitioned.sql.
Partition the table by order_date using RANGE partitioning.
Cluster the table by customer_id.
Use a simple SELECT statement from the raw sales table.
Print the final SQL code to verify the partitioning and clustering syntax.
💡 Why This Matters
🌍 Real World
Large datasets in data warehouses can be slow to query. Partitioning and clustering organize data to speed up queries and reduce costs.
💼 Career
Data analysts and engineers use dbt to build efficient data models that improve performance and maintainability in analytics workflows.
Progress0 / 4 steps
1
Create the base sales model
Create a dbt model SQL file named sales_partitioned.sql. Write a SELECT statement that selects all columns from the table raw_sales. Assign this SQL to a variable called sales_sql.
dbt
Need a hint?

Use triple quotes for multi-line SQL string. Write a simple SELECT * query from raw_sales.

2
Add partitioning configuration
Create a dictionary called partition_config with keys partition_by and data_type. Set partition_by to 'order_date' and data_type to 'RANGE'.
dbt
Need a hint?

Create a dictionary with the exact keys and values for partitioning.

3
Add clustering configuration
Create a list called cluster_by with a single string element 'customer_id'.
dbt
Need a hint?

Create a list with the clustering column name.

4
Print the final dbt model SQL with partitioning and clustering
Print the final SQL string that includes the partitioning and clustering configuration. Use an f-string to combine sales_sql, partition_config['partition_by'], and cluster_by[0] in the format:
CREATE TABLE sales_partitioned PARTITION BY RANGE(order_date) CLUSTER BY customer_id AS followed by the sales_sql query.
dbt
Need a hint?

Use an f-string to build the final SQL string and print it.