Bird
Raised Fist0
Snowflakecloud~5 mins

Semi-structured data querying (JSON, Avro) in Snowflake - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is semi-structured data?
Semi-structured data is data that does not follow a strict table format but has some organizational properties like tags or keys, for example, JSON or Avro files.
Click to reveal answer
beginner
How does Snowflake store JSON data?
Snowflake stores JSON data in a VARIANT column type, which allows flexible storage and querying of nested JSON structures.
Click to reveal answer
intermediate
Which Snowflake function extracts a value from a JSON object?
The function is called GET_PATH or the simpler : operator, for example, data:field extracts the value of 'field' from the JSON stored in 'data'.
Click to reveal answer
intermediate
What is Avro and why is it used?
Avro is a compact, fast, binary data serialization format often used for big data. It supports schema evolution and is efficient for storage and transport.
Click to reveal answer
intermediate
How do you query nested JSON arrays in Snowflake?
You use the LATERAL FLATTEN function to expand arrays into rows, allowing you to query each element individually.
Click to reveal answer
Which Snowflake data type is best for storing JSON data?
AVARIANT
BVARCHAR
CINTEGER
DBOOLEAN
What does the Snowflake function LATERAL FLATTEN do?
AConverts JSON arrays into multiple rows
BCompresses JSON data
CEncrypts data
DJoins two tables
Avro files are primarily used because they are:
APlain text and human-readable
BBinary and support schema evolution
COnly for images
DUnstructured data format
How do you extract a nested field 'name' from a JSON column 'data' in Snowflake?
Adata.name
Bdata->name
Cdata:name
Ddata[name]
Which of the following is NOT a characteristic of semi-structured data?
AOften stored as JSON or Avro
BFlexible schema
CContains nested structures
DFixed schema with strict columns
Explain how Snowflake handles querying JSON data stored in a VARIANT column.
Think about how you would get data from a nested folder in your computer.
You got /4 concepts.
    Describe the advantages of using Avro format for big data storage and querying.
    Consider why a small, flexible box is better for shipping many items.
    You got /4 concepts.

      Practice

      (1/5)
      1. What is the Snowflake data type used to store semi-structured data like JSON or Avro?
      easy
      A. INTEGER
      B. VARIANT
      C. VARCHAR
      D. BOOLEAN

      Solution

      1. Step 1: Understand Snowflake data types

        Snowflake uses specific data types for different data. VARIANT is designed for semi-structured data.
      2. Step 2: Identify the correct type for JSON/Avro

        VARIANT can store JSON, Avro, XML, and other semi-structured formats directly.
      3. Final Answer:

        VARIANT -> Option B
      4. Quick Check:

        Semi-structured data type = VARIANT [OK]
      Hint: Remember VARIANT stores JSON/Avro data in Snowflake [OK]
      Common Mistakes:
      • Choosing VARCHAR which stores plain text only
      • Confusing INTEGER or BOOLEAN with semi-structured types
      • Thinking JSON needs special external storage
      2. Which of the following is the correct way to extract the value of the key name from a VARIANT column data containing JSON in Snowflake as a string?
      easy
      A. data:name
      B. data['name']
      C. data:name::string
      D. data->'name'

      Solution

      1. Step 1: Understand JSON field extraction syntax in Snowflake

        Snowflake uses colon : to access JSON keys inside VARIANT columns.
      2. Step 2: Cast extracted value to string for proper type

        Using ::string casts the extracted value to string, which is often needed for correct results.
      3. Final Answer:

        data:name::string -> Option C
      4. Quick Check:

        Extract and cast JSON key = data:name::string [OK]
      Hint: Use colon and cast (::string) to get JSON string value [OK]
      Common Mistakes:
      • Using incorrect arrow syntax like data->'name'
      • Not casting extracted value to string
      • Using bracket notation data['name'] without casting to string
      3. Given the JSON data stored in a VARIANT column data:
      {"user": {"id": 101, "active": true}}
      What will the query SELECT data:user:id::int FROM users; return?
      medium
      A. 101
      B. "101"
      C. true
      D. NULL

      Solution

      1. Step 1: Access nested JSON key

        The query accesses user object then id key inside it.
      2. Step 2: Cast the extracted value to integer

        The ::int cast converts the value 101 to integer type.
      3. Final Answer:

        101 -> Option A
      4. Quick Check:

        Nested JSON id cast to int = 101 [OK]
      Hint: Use colon to access nested keys and cast to int for numbers [OK]
      Common Mistakes:
      • Returning string "101" without cast
      • Confusing boolean true with id value
      • Getting NULL due to wrong key access
      4. You run the query SELECT data:user:active FROM users; but get NULL values even though the JSON has "active": true. What is the likely cause?
      medium
      A. Missing cast to BOOLEAN
      B. JSON key is case-sensitive and should be capitalized
      C. Incorrect key path syntax
      D. Column data is not VARIANT type

      Solution

      1. Step 1: Check data type of column

        If the column is not VARIANT, JSON path extraction returns NULL.
      2. Step 2: Confirm correct key path and case

        The key path user:active is correct and JSON keys are case-sensitive but here lowercase matches JSON.
      3. Final Answer:

        Column data is not VARIANT type -> Option D
      4. Quick Check:

        Non-VARIANT column returns NULL on JSON path [OK]
      Hint: Ensure column is VARIANT type to query JSON paths [OK]
      Common Mistakes:
      • Assuming missing cast causes NULL for boolean
      • Using wrong key path syntax
      • Ignoring data type of the column
      5. You have a VARIANT column data storing JSON arrays like
      {"items": [{"id": 1}, {"id": 2}, {"id": 3}]}
      . Which query correctly extracts all id values from the items array as separate rows?
      hard
      A. SELECT value:id::int FROM users, LATERAL FLATTEN(input => data:items);
      B. SELECT data:items:id FROM users;
      C. SELECT data:items[0]:id FROM users;
      D. SELECT FLATTEN(data:items):id FROM users;

      Solution

      1. Step 1: Use FLATTEN to expand JSON array

        FLATTEN function explodes the array into rows, each with a value field.
      2. Step 2: Extract id from each value and cast to int

        Access value:id and cast to integer for each row.
      3. Final Answer:

        SELECT value:id::int FROM users, LATERAL FLATTEN(input => data:items); -> Option A
      4. Quick Check:

        Use FLATTEN with LATERAL and extract id from value [OK]
      Hint: Use LATERAL FLATTEN to turn JSON arrays into rows [OK]
      Common Mistakes:
      • Trying to access array elements without FLATTEN
      • Using incorrect syntax like data:items:id
      • Not casting extracted values to int