Bird
Raised Fist0
Snowflakecloud~30 mins

Semi-structured data querying (JSON, Avro) in Snowflake - Mini Project: Build & Apply

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Querying Semi-Structured JSON Data in Snowflake
📖 Scenario: You work as a data analyst at a retail company. The company stores customer order details in a Snowflake table as JSON data. You need to extract specific information from this semi-structured JSON data to generate reports.
🎯 Goal: Build a Snowflake SQL query that extracts customer names and their order amounts from a JSON column.
📋 What You'll Learn
Create a table called orders with a data column of type VARIANT to store JSON.
Insert sample JSON data with customer names and order amounts.
Write a query to extract customer_name and order_amount from the JSON data.
Use Snowflake's LATERAL FLATTEN function to handle arrays in JSON.
💡 Why This Matters
🌍 Real World
Many companies store data in semi-structured formats like JSON in cloud data warehouses. Querying this data efficiently is essential for reporting and analytics.
💼 Career
Data analysts and engineers often need to extract and transform JSON data stored in Snowflake to generate business insights.
Progress0 / 4 steps
1
Create the orders table with a data column
Write a Snowflake SQL statement to create a table called orders with one column named data of type VARIANT.
Snowflake
Hint

The VARIANT data type in Snowflake stores semi-structured data like JSON.

2
Insert sample JSON data into the orders table
Insert two rows into the orders table. Each row's data column should contain JSON with keys customer_name and orders. The orders key holds an array of objects with order_id and order_amount. Use these exact JSON values:

Row 1 JSON: {"customer_name": "Alice", "orders": [{"order_id": 101, "order_amount": 250}, {"order_id": 102, "order_amount": 450}]}
Row 2 JSON: {"customer_name": "Bob", "orders": [{"order_id": 103, "order_amount": 300}]}
Snowflake
Hint

Use PARSE_JSON() to convert JSON strings into VARIANT data type for insertion.

3
Write a query to extract customer names and order amounts
Write a Snowflake SQL query that selects customer_name and order_amount from the orders table. Use LATERAL FLATTEN on the orders array inside the JSON data column to access each order object. Use the aliases o for the table and f for the flatten function. Extract customer_name from o.data and order_amount from f.value.
Snowflake
Hint

Use LATERAL FLATTEN to expand the orders array and extract each order's amount.

4
Complete the query with ordering by customer name and order amount
Add an ORDER BY clause to the query to sort the results first by customer_name ascending, then by order_amount descending.
Snowflake
Hint

Use ORDER BY customer_name ASC, order_amount DESC to sort the results.

Practice

(1/5)
1. What is the Snowflake data type used to store semi-structured data like JSON or Avro?
easy
A. INTEGER
B. VARIANT
C. VARCHAR
D. BOOLEAN

Solution

  1. Step 1: Understand Snowflake data types

    Snowflake uses specific data types for different data. VARIANT is designed for semi-structured data.
  2. Step 2: Identify the correct type for JSON/Avro

    VARIANT can store JSON, Avro, XML, and other semi-structured formats directly.
  3. Final Answer:

    VARIANT -> Option B
  4. Quick Check:

    Semi-structured data type = VARIANT [OK]
Hint: Remember VARIANT stores JSON/Avro data in Snowflake [OK]
Common Mistakes:
  • Choosing VARCHAR which stores plain text only
  • Confusing INTEGER or BOOLEAN with semi-structured types
  • Thinking JSON needs special external storage
2. Which of the following is the correct way to extract the value of the key name from a VARIANT column data containing JSON in Snowflake as a string?
easy
A. data:name
B. data['name']
C. data:name::string
D. data->'name'

Solution

  1. Step 1: Understand JSON field extraction syntax in Snowflake

    Snowflake uses colon : to access JSON keys inside VARIANT columns.
  2. Step 2: Cast extracted value to string for proper type

    Using ::string casts the extracted value to string, which is often needed for correct results.
  3. Final Answer:

    data:name::string -> Option C
  4. Quick Check:

    Extract and cast JSON key = data:name::string [OK]
Hint: Use colon and cast (::string) to get JSON string value [OK]
Common Mistakes:
  • Using incorrect arrow syntax like data->'name'
  • Not casting extracted value to string
  • Using bracket notation data['name'] without casting to string
3. Given the JSON data stored in a VARIANT column data:
{"user": {"id": 101, "active": true}}
What will the query SELECT data:user:id::int FROM users; return?
medium
A. 101
B. "101"
C. true
D. NULL

Solution

  1. Step 1: Access nested JSON key

    The query accesses user object then id key inside it.
  2. Step 2: Cast the extracted value to integer

    The ::int cast converts the value 101 to integer type.
  3. Final Answer:

    101 -> Option A
  4. Quick Check:

    Nested JSON id cast to int = 101 [OK]
Hint: Use colon to access nested keys and cast to int for numbers [OK]
Common Mistakes:
  • Returning string "101" without cast
  • Confusing boolean true with id value
  • Getting NULL due to wrong key access
4. You run the query SELECT data:user:active FROM users; but get NULL values even though the JSON has "active": true. What is the likely cause?
medium
A. Missing cast to BOOLEAN
B. JSON key is case-sensitive and should be capitalized
C. Incorrect key path syntax
D. Column data is not VARIANT type

Solution

  1. Step 1: Check data type of column

    If the column is not VARIANT, JSON path extraction returns NULL.
  2. Step 2: Confirm correct key path and case

    The key path user:active is correct and JSON keys are case-sensitive but here lowercase matches JSON.
  3. Final Answer:

    Column data is not VARIANT type -> Option D
  4. Quick Check:

    Non-VARIANT column returns NULL on JSON path [OK]
Hint: Ensure column is VARIANT type to query JSON paths [OK]
Common Mistakes:
  • Assuming missing cast causes NULL for boolean
  • Using wrong key path syntax
  • Ignoring data type of the column
5. You have a VARIANT column data storing JSON arrays like
{"items": [{"id": 1}, {"id": 2}, {"id": 3}]}
. Which query correctly extracts all id values from the items array as separate rows?
hard
A. SELECT value:id::int FROM users, LATERAL FLATTEN(input => data:items);
B. SELECT data:items:id FROM users;
C. SELECT data:items[0]:id FROM users;
D. SELECT FLATTEN(data:items):id FROM users;

Solution

  1. Step 1: Use FLATTEN to expand JSON array

    FLATTEN function explodes the array into rows, each with a value field.
  2. Step 2: Extract id from each value and cast to int

    Access value:id and cast to integer for each row.
  3. Final Answer:

    SELECT value:id::int FROM users, LATERAL FLATTEN(input => data:items); -> Option A
  4. Quick Check:

    Use FLATTEN with LATERAL and extract id from value [OK]
Hint: Use LATERAL FLATTEN to turn JSON arrays into rows [OK]
Common Mistakes:
  • Trying to access array elements without FLATTEN
  • Using incorrect syntax like data:items:id
  • Not casting extracted values to int