Semi-structured data handling (JSON) in dbt - Time & Space Complexity
When working with JSON data in dbt, it's important to know how the time to process this data grows as the data size increases.
We want to understand how the cost changes when extracting or transforming JSON fields.
Analyze the time complexity of the following dbt SQL snippet that extracts values from JSON.
select
id,
json_extract_path_text(data, 'user', 'name') as user_name
from {{ ref('raw_events') }}
This code extracts the "name" field nested inside the "user" object from a JSON column for each row.
Look at what repeats as the input grows.
- Primary operation: Extracting JSON fields from each row.
- How many times: Once per row in the input table.
As the number of rows increases, the extraction runs more times.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 extractions |
| 100 | 100 extractions |
| 1000 | 1000 extractions |
Pattern observation: The work grows directly with the number of rows.
Time Complexity: O(n)
This means the time to process grows linearly with the number of rows.
[X] Wrong: "Extracting JSON fields is instant and does not depend on data size."
[OK] Correct: Each row requires parsing the JSON, so more rows mean more work and more time.
Understanding how JSON extraction scales helps you explain data processing costs clearly and shows you can reason about query performance.
"What if the JSON data was deeply nested with many levels? How would that affect the time complexity?"