0
0
dbtdata~30 mins

Semi-structured data handling (JSON) in dbt - Mini Project: Build & Apply

Choose your learning style9 modes available
Handling Semi-structured JSON Data in dbt
📖 Scenario: You work as a data analyst at an e-commerce company. Your team stores customer order details in a database where one column contains JSON data about each order's items. You want to extract useful information from this JSON to analyze product sales.
🎯 Goal: Build a dbt model that extracts product names and quantities from a JSON column and calculates the total quantity sold per product.
📋 What You'll Learn
Create a source table with a JSON column named order_items containing order details.
Define a config variable for filtering orders by a minimum quantity threshold.
Write a dbt model using SQL to parse the JSON and aggregate total quantities per product.
Output the aggregated results showing product names and total quantities.
💡 Why This Matters
🌍 Real World
Many companies store order or event details as JSON in databases. Extracting and analyzing this semi-structured data helps understand customer behavior and sales trends.
💼 Career
Data analysts and engineers often need to parse JSON data in SQL to prepare clean datasets for reporting and machine learning.
Progress0 / 4 steps
1
Create source table with JSON data
Create a source table called raw_orders with two columns: order_id (integer) and order_items (JSON). Insert exactly these two rows: (1, '[{"product": "T-shirt", "quantity": 2}, {"product": "Hat", "quantity": 1}]') and (2, '[{"product": "T-shirt", "quantity": 1}, {"product": "Shoes", "quantity": 3}]').
dbt
Need a hint?

Use create table to define the table and insert into to add the two rows with JSON strings exactly as shown.

2
Define minimum quantity threshold
In your dbt model SQL file, create a variable called min_quantity and set it to 2. This variable will be used to filter products with quantities greater or equal to this value.
dbt
Need a hint?

Use dbt Jinja syntax {% set min_quantity = 2 %} to define the variable.

3
Parse JSON and aggregate quantities
Write a SQL query in your dbt model that:
1. Selects order_id and unnests the order_items JSON array into rows with columns product and quantity.
2. Filters rows where quantity is greater or equal to min_quantity.
3. Groups by product and sums the quantity as total_quantity.
dbt
Need a hint?

Use jsonb_array_elements to unnest JSON arrays and cast quantities to integers for filtering and summing.

4
Display aggregated product quantities
Add a select statement to print the final aggregated table showing product and total_quantity columns.
dbt
Need a hint?

Run the query to see the aggregated product quantities filtered by the minimum quantity.