0
0
Snowflakecloud~5 mins

Semi-structured data querying (JSON, Avro) in Snowflake - Commands & Configuration

Choose your learning style9 modes available
Introduction
Sometimes data comes in formats that are not simple tables, like JSON or Avro. Snowflake lets you easily read and query this kind of data without changing it into tables first.
When you receive data from web apps that send information as JSON.
When you want to analyze logs stored in Avro format without converting them.
When you need to combine structured data with flexible, nested data in one query.
When you want to quickly explore data from APIs that return JSON responses.
When you want to store and query event data that has varying fields.
Commands
Create a table with a VARIANT column to store JSON data. VARIANT can hold any semistructured data.
Terminal
CREATE OR REPLACE TABLE events_json (data VARIANT);
Expected OutputExpected
Table EVENTS_JSON successfully created.
Insert a JSON record into the VARIANT column using PARSE_JSON to convert the string into JSON format.
Terminal
INSERT INTO events_json (data) VALUES (PARSE_JSON('{"user":"alice","action":"login","time":"2024-06-01T12:00:00Z"}'));
Expected OutputExpected
1 row inserted.
Query the JSON data by extracting the 'user' and 'action' fields from the VARIANT column.
Terminal
SELECT data:user AS user, data:action AS action FROM events_json;
Expected OutputExpected
USER | ACTION alice | login
Create a file format object for Avro files so Snowflake knows how to read them.
Terminal
CREATE OR REPLACE FILE FORMAT avro_format TYPE = 'AVRO';
Expected OutputExpected
File format AVRO_FORMAT successfully created.
Create a stage pointing to an S3 bucket with Avro files using the Avro file format.
Terminal
CREATE OR REPLACE STAGE avro_stage URL='s3://example-bucket/avro-data/' FILE_FORMAT = avro_format;
Expected OutputExpected
Stage AVRO_STAGE successfully created.
Create a table with a VARIANT column to load Avro data into.
Terminal
CREATE OR REPLACE TABLE events_avro (data VARIANT);
Expected OutputExpected
Table EVENTS_AVRO successfully created.
Load Avro files from the stage into the table. Snowflake automatically parses Avro into VARIANT.
Terminal
COPY INTO events_avro FROM @avro_stage FILE_FORMAT = (FORMAT_NAME = 'avro_format');
Expected OutputExpected
Copy into EVENTS_AVRO completed. 5 files loaded, 0 errors.
Query nested fields inside the Avro data stored in VARIANT, showing event ID and status.
Terminal
SELECT data:event_id AS event_id, data:details:status AS status FROM events_avro LIMIT 3;
Expected OutputExpected
EVENT_ID | STATUS 123 | success 124 | failed 125 | success
Key Concept

If you remember nothing else from this pattern, remember: Snowflake's VARIANT type lets you store and query JSON or Avro data directly without converting it first.

Common Mistakes
Trying to insert JSON data as plain text without using PARSE_JSON.
Snowflake treats it as a string, so you cannot query JSON fields properly.
Always use PARSE_JSON to convert JSON strings into VARIANT format before inserting.
Not creating or specifying the correct file format when loading Avro files.
Snowflake cannot parse the files correctly and loading fails or data is incorrect.
Create a FILE FORMAT object with TYPE='AVRO' and use it when creating the stage or loading data.
Querying nested JSON or Avro fields without using the colon syntax (data:field).
Snowflake will not find the nested data and returns NULL or errors.
Use colon notation to access nested fields inside VARIANT columns.
Summary
Create tables with VARIANT columns to store JSON or Avro data.
Use PARSE_JSON to insert JSON data correctly.
Create file formats and stages to load Avro files.
Query nested fields using colon notation on VARIANT columns.