Semi-structured data querying (JSON, Avro) in Snowflake - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When working with semi-structured data like JSON or Avro in Snowflake, it is important to understand how query time changes as data size grows.
We want to know how the number of operations grows when extracting or filtering nested data.
Analyze the time complexity of the following operation sequence.
SELECT
data:id AS user_id,
data:attributes:name AS user_name
FROM
users_table
WHERE
data:attributes:active = true;
This query extracts user ID and name from JSON stored in a column, filtering only active users.
Identify the API calls, resource provisioning, data transfers that repeat.
- Primary operation: Parsing and extracting JSON fields for each row.
- How many times: Once per row scanned in the table.
As the number of rows increases, the system must parse and extract data from each JSON object separately.
| Input Size (n) | Approx. API Calls/Operations |
|---|---|
| 10 | 10 JSON parses and extractions |
| 100 | 100 JSON parses and extractions |
| 1000 | 1000 JSON parses and extractions |
Pattern observation: The number of JSON parsing operations grows directly with the number of rows.
Time Complexity: O(n)
This means the time to query grows linearly as the number of rows with JSON data increases.
[X] Wrong: "Querying nested JSON fields is instant and does not depend on data size."
[OK] Correct: Each row's JSON must be parsed and filtered, so more rows mean more work and longer query time.
Understanding how semi-structured data queries scale helps you design efficient data models and write performant queries in real projects.
"What if we indexed the JSON fields or flattened the data into columns? How would the time complexity change?"
Practice
Solution
Step 1: Understand Snowflake data types
Snowflake uses specific data types for different data. VARIANT is designed for semi-structured data.Step 2: Identify the correct type for JSON/Avro
VARIANT can store JSON, Avro, XML, and other semi-structured formats directly.Final Answer:
VARIANT -> Option BQuick Check:
Semi-structured data type = VARIANT [OK]
- Choosing VARCHAR which stores plain text only
- Confusing INTEGER or BOOLEAN with semi-structured types
- Thinking JSON needs special external storage
name from a VARIANT column data containing JSON in Snowflake as a string?Solution
Step 1: Understand JSON field extraction syntax in Snowflake
Snowflake uses colon:to access JSON keys inside VARIANT columns.Step 2: Cast extracted value to string for proper type
Using::stringcasts the extracted value to string, which is often needed for correct results.Final Answer:
data:name::string -> Option CQuick Check:
Extract and cast JSON key = data:name::string [OK]
- Using incorrect arrow syntax like data->'name'
- Not casting extracted value to string
- Using bracket notation
data['name']without casting to string
data: {"user": {"id": 101, "active": true}} What will the query SELECT data:user:id::int FROM users; return?Solution
Step 1: Access nested JSON key
The query accessesuserobject thenidkey inside it.Step 2: Cast the extracted value to integer
The::intcast converts the value 101 to integer type.Final Answer:
101 -> Option AQuick Check:
Nested JSON id cast to int = 101 [OK]
- Returning string "101" without cast
- Confusing boolean true with id value
- Getting NULL due to wrong key access
SELECT data:user:active FROM users; but get NULL values even though the JSON has "active": true. What is the likely cause?Solution
Step 1: Check data type of column
If the column is not VARIANT, JSON path extraction returns NULL.Step 2: Confirm correct key path and case
The key pathuser:activeis correct and JSON keys are case-sensitive but here lowercase matches JSON.Final Answer:
Column data is not VARIANT type -> Option DQuick Check:
Non-VARIANT column returns NULL on JSON path [OK]
- Assuming missing cast causes NULL for boolean
- Using wrong key path syntax
- Ignoring data type of the column
data storing JSON arrays like {"items": [{"id": 1}, {"id": 2}, {"id": 3}]}. Which query correctly extracts all id values from the items array as separate rows?Solution
Step 1: Use FLATTEN to expand JSON array
FLATTEN function explodes the array into rows, each with avaluefield.Step 2: Extract
Accessidfrom eachvalueand cast to intvalue:idand cast to integer for each row.Final Answer:
SELECT value:id::int FROM users, LATERAL FLATTEN(input => data:items); -> Option AQuick Check:
Use FLATTEN with LATERAL and extract id from value [OK]
- Trying to access array elements without FLATTEN
- Using incorrect syntax like data:items:id
- Not casting extracted values to int
