0
0
dbtdata~10 mins

Semi-structured data handling (JSON) in dbt - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Semi-structured data handling (JSON)
Load JSON data
Parse JSON fields
Extract nested values
Transform or filter data
Load into structured table
Use in SQL queries
This flow shows how JSON data is loaded, parsed, transformed, and stored in a structured format for querying.
Execution Sample
dbt
select
  id,
  json_extract_path_text(data, 'user', 'name') as user_name,
  (json_extract_path_text(data, 'user', 'age'))::int as user_age
from raw_json_table
where (json_extract_path_text(data, 'user', 'age'))::int > 30
This code extracts user name and age from JSON data and filters users older than 30.
Execution Table
StepActionInput DataExtracted ValuesFilter ResultOutput Row
1Read row 1{"user":{"name":"Alice","age":35}}user_name=Alice, user_age=3535 > 30 is TrueRow included: id=1, user_name=Alice, user_age=35
2Read row 2{"user":{"name":"Bob","age":28}}user_name=Bob, user_age=2828 > 30 is FalseRow excluded
3Read row 3{"user":{"name":"Carol","age":40}}user_name=Carol, user_age=4040 > 30 is TrueRow included: id=3, user_name=Carol, user_age=40
4End of dataNo more rowsN/AN/AQuery ends
💡 All rows processed; filter excludes users age <= 30
Variable Tracker
VariableStartAfter Row 1After Row 2After Row 3Final
idN/A1233
user_nameN/AAliceBobCarolCarol
user_ageN/A35284040
output_rows_count01122
Key Moments - 2 Insights
Why is the row with user age 28 excluded from the output?
Because the filter condition 'user_age > 30' is false for age 28, as shown in execution_table row 2.
How does json_extract_path_text work with nested JSON?
It navigates the JSON keys step-by-step, for example 'user' then 'name', extracting the text value, as seen in execution_table extracted values.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the user_age extracted at step 3?
A28
B40
C35
DN/A
💡 Hint
Check the 'Extracted Values' column at step 3 in the execution_table.
At which step does the filter condition become false?
AStep 2
BStep 1
CStep 3
DNo step
💡 Hint
Look at the 'Filter Result' column in execution_table for the false condition.
If we change the filter to 'user_age > 25', how many rows would be included?
A1
B2
C3
D0
💡 Hint
Refer to variable_tracker output_rows_count and consider ages 35, 28, 40.
Concept Snapshot
Semi-structured data handling (JSON) in dbt:
- Use json_extract_path_text() to get nested JSON values
- Cast extracted text to needed types (e.g., ::int)
- Filter or transform based on extracted values
- Load results into structured tables for SQL queries
- Enables querying JSON data like normal columns
Full Transcript
This lesson shows how to handle JSON data in dbt. We start by loading JSON rows. Then we extract nested values using json_extract_path_text. For example, we get user name and age from the JSON. We convert age to integer to filter users older than 30. The execution table shows each row read, values extracted, filter applied, and whether the row is included. The variable tracker follows variables like id, user_name, user_age, and counts output rows. Key moments clarify why some rows are excluded and how JSON extraction works. The quiz tests understanding of extracted values, filter steps, and effects of changing filter conditions. The snapshot summarizes the main steps to handle JSON data in dbt.