0
0
Snowflakecloud~10 mins

Semi-structured data querying (JSON, Avro) in Snowflake - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Semi-structured data querying (JSON, Avro)
Load semi-structured data into VARIANT column
Use Snowflake SQL functions to parse data
Extract fields using dot notation or functions
Filter or transform data as needed
Return structured query results
This flow shows how Snowflake loads semi-structured data, parses it, extracts fields, and returns query results.
Execution Sample
Snowflake
SELECT
  data:name AS name,
  data:age AS age
FROM people_json
WHERE data:age > 30;
This query extracts 'name' and 'age' from JSON data stored in a VARIANT column and filters for age over 30.
Process Table
StepActionInput DataOperationResult
1Read table 'people_json'{"name": "Alice", "age": 25}Load VARIANT columnRow with VARIANT JSON data
2Read table 'people_json'{"name": "Bob", "age": 35}Load VARIANT columnRow with VARIANT JSON data
3Extract 'name' and 'age' from VARIANTRow with VARIANT JSON dataUse dot notation data:name, data:agename='Alice', age=25 for first row
4Extract 'name' and 'age' from VARIANTRow with VARIANT JSON dataUse dot notation data:name, data:agename='Bob', age=35 for second row
5Apply filter WHERE data:age > 30name='Alice', age=25Check age > 30False, exclude row
6Apply filter WHERE data:age > 30name='Bob', age=35Check age > 30True, include row
7Return query resultsFiltered rowsOutput selected columnsResult: name='Bob', age=35
💡 Query ends after filtering all rows and returning matching results.
Status Tracker
VariableStartAfter Row 1After Row 2Final
dataVARIANT column with JSON{"name": "Alice", "age": 25}{"name": "Bob", "age": 35}N/A
nameN/AAliceBobBob
ageN/A253535
filter_resultN/AFalseTrueTrue
Key Moments - 3 Insights
Why does the query exclude Alice's row even though her data is loaded?
Because the filter condition data:age > 30 is False for Alice (age 25), so her row is excluded as shown in execution_table step 5.
How does Snowflake access fields inside the JSON stored in VARIANT?
Snowflake uses dot notation like data:name or functions to extract fields from the VARIANT column, as shown in steps 3 and 4.
What type of data does the VARIANT column hold?
VARIANT holds semi-structured data like JSON or Avro, allowing flexible schema storage, as seen in step 1 and 2.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 5, what is the filter_result for Alice's row?
ATrue
BFalse
CNull
DError
💡 Hint
Check the 'filter_result' column in variable_tracker after Row 1 and execution_table step 5.
At which step does the query exclude a row based on age?
AStep 5
BStep 4
CStep 3
DStep 7
💡 Hint
Look at execution_table where filtering happens using WHERE clause.
If the filter changed to data:age > 20, which rows would be included?
AOnly Bob's row
BOnly Alice's row
CBoth Alice's and Bob's rows
DNo rows
💡 Hint
Refer to variable_tracker ages and filter condition logic.
Concept Snapshot
Snowflake stores semi-structured data in VARIANT columns.
Use dot notation (data:field) to extract JSON or Avro fields.
Apply SQL filters on extracted fields.
Query returns structured results from semi-structured data.
No schema needed upfront; flexible querying.
Ideal for JSON, Avro, XML data formats.
Full Transcript
This lesson shows how Snowflake handles semi-structured data like JSON or Avro stored in VARIANT columns. The data is loaded as-is without fixed schema. Using SQL dot notation, fields inside the JSON are extracted. Then filters like WHERE data:age > 30 select rows. The execution table traces each step: loading rows, extracting fields, filtering, and returning results. Variables like 'name' and 'age' track extracted values. Key moments clarify why some rows are excluded and how dot notation works. The quiz tests understanding of filtering and data extraction steps. This approach lets you query flexible data formats easily in Snowflake.