Bird
Raised Fist0
Snowflakecloud~5 mins

File formats (CSV, JSON, Parquet, Avro) in Snowflake - Commands & Configuration

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
When you want to load or save data in Snowflake, you use file formats to tell Snowflake how the data is organized. Different file formats like CSV, JSON, Parquet, and Avro help Snowflake understand the data structure so it can read or write it correctly.
When you want to load a simple table from a text file with columns separated by commas, use CSV.
When you have nested or complex data like lists or objects, use JSON.
When you want efficient storage and fast queries on large datasets, use Parquet.
When you need a compact binary format that supports schema evolution, use Avro.
Config File - file_format.sql
file_format.sql
CREATE OR REPLACE FILE FORMAT my_csv_format
  TYPE = 'CSV'
  FIELD_DELIMITER = ','
  SKIP_HEADER = 1
  FIELD_OPTIONALLY_ENCLOSED_BY = '"'
  NULL_IF = ('NULL', 'null');

CREATE OR REPLACE FILE FORMAT my_json_format
  TYPE = 'JSON';

CREATE OR REPLACE FILE FORMAT my_parquet_format
  TYPE = 'PARQUET';

CREATE OR REPLACE FILE FORMAT my_avro_format
  TYPE = 'AVRO';

This SQL script creates four file formats in Snowflake:

  • my_csv_format: Defines how CSV files are read, including comma as separator, skipping the header row, and handling quoted fields.
  • my_json_format: Defines JSON file format for semi-structured data.
  • my_parquet_format: Defines Parquet file format for columnar storage.
  • my_avro_format: Defines Avro file format for compact binary data.
Commands
This command creates a CSV file format named my_csv_format. It tells Snowflake how to read CSV files with commas, skip the first header row, and treat quoted fields correctly.
Terminal
CREATE OR REPLACE FILE FORMAT my_csv_format TYPE = 'CSV' FIELD_DELIMITER = ',' SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '"' NULL_IF = ('NULL', 'null');
Expected OutputExpected
File format MY_CSV_FORMAT successfully created.
TYPE - Specifies the file format type, here CSV.
FIELD_DELIMITER - Defines the character that separates fields, here a comma.
SKIP_HEADER - Skips the first row, usually headers.
This command creates a JSON file format named my_json_format. It tells Snowflake to expect JSON structured data.
Terminal
CREATE OR REPLACE FILE FORMAT my_json_format TYPE = 'JSON';
Expected OutputExpected
File format MY_JSON_FORMAT successfully created.
TYPE - Specifies the file format type, here JSON.
This command creates a Parquet file format named my_parquet_format. Parquet is a columnar storage format good for big data.
Terminal
CREATE OR REPLACE FILE FORMAT my_parquet_format TYPE = 'PARQUET';
Expected OutputExpected
File format MY_PARQUET_FORMAT successfully created.
TYPE - Specifies the file format type, here Parquet.
This command creates an Avro file format named my_avro_format. Avro is a compact binary format that supports schema changes.
Terminal
CREATE OR REPLACE FILE FORMAT my_avro_format TYPE = 'AVRO';
Expected OutputExpected
File format MY_AVRO_FORMAT successfully created.
TYPE - Specifies the file format type, here Avro.
This command shows details about the CSV file format to verify it was created correctly.
Terminal
DESC FILE FORMAT my_csv_format;
Expected OutputExpected
name | my_csv_format format_type | CSV field_delimiter| , skip_header | 1 field_optionally_enclosed_by | " null_if | (NULL, null)
Key Concept

If you remember nothing else from this pattern, remember: file formats tell Snowflake how to read or write your data files correctly.

Common Mistakes
Not specifying SKIP_HEADER for CSV files with headers
Snowflake will treat the header row as data, causing errors or wrong data loading.
Always set SKIP_HEADER = 1 if your CSV file has a header row.
Using wrong TYPE for the file format
Snowflake will fail to parse the file because it expects a different format.
Choose the correct TYPE like CSV, JSON, PARQUET, or AVRO matching your file.
Not enclosing fields properly in CSV
Fields with commas inside may break parsing if not enclosed by quotes.
Use FIELD_OPTIONALLY_ENCLOSED_BY = '"' to handle quoted fields.
Summary
Create file formats in Snowflake to define how data files are read or written.
Use commands like CREATE FILE FORMAT with TYPE and options for CSV, JSON, Parquet, or Avro.
Verify file formats with DESC FILE FORMAT to ensure correct settings.

Practice

(1/5)
1. Which file format in Snowflake is best suited for storing hierarchical data with nested structures?
easy
A. Avro
B. JSON
C. Parquet
D. CSV

Solution

  1. Step 1: Understand file format characteristics

    JSON supports nested and hierarchical data structures naturally, unlike CSV which is flat.
  2. Step 2: Compare JSON with other formats

    Parquet and Avro also support nested data but JSON is most commonly used for hierarchical data due to its readability and flexibility.
  3. Final Answer:

    JSON -> Option B
  4. Quick Check:

    Hierarchical data = JSON [OK]
Hint: Nested data? Think JSON first [OK]
Common Mistakes:
  • Choosing CSV for nested data
  • Confusing Parquet with JSON for readability
  • Assuming Avro is always best for nested data
2. Which Snowflake file format option correctly specifies that the CSV file uses a semicolon as the field delimiter?
easy
A. FIELD_DELIMITER = ';'
B. FIELD_DELIMITER = ','
C. FIELD_DELIMITER = ':'
D. FIELD_DELIMITER = '|'

Solution

  1. Step 1: Identify the delimiter option for CSV in Snowflake

    Snowflake uses FIELD_DELIMITER to specify the character separating fields in CSV files.
  2. Step 2: Match the semicolon delimiter

    The semicolon character is ';', so FIELD_DELIMITER = ';' is correct.
  3. Final Answer:

    FIELD_DELIMITER = ';' -> Option A
  4. Quick Check:

    Semicolon delimiter = FIELD_DELIMITER ';' [OK]
Hint: Delimiter option is FIELD_DELIMITER [OK]
Common Mistakes:
  • Using comma instead of semicolon
  • Confusing FIELD_DELIMITER with RECORD_DELIMITER
  • Using wrong delimiter characters
3. Given this Snowflake file format definition for JSON:
CREATE FILE FORMAT my_json_format TYPE = 'JSON' STRIP_OUTER_ARRAY = TRUE;

What happens when you load a JSON file containing an outer array of objects?
medium
A. Snowflake loads the entire array as a single row
B. Snowflake ignores the outer array and loads nothing
C. Snowflake throws an error due to the outer array
D. Snowflake loads each object inside the outer array as a separate row

Solution

  1. Step 1: Understand STRIP_OUTER_ARRAY option

    This option tells Snowflake to treat each element inside the outer JSON array as a separate record.
  2. Step 2: Apply to loading behavior

    When loading, Snowflake will parse the outer array and load each object inside it as its own row.
  3. Final Answer:

    Snowflake loads each object inside the outer array as a separate row -> Option D
  4. Quick Check:

    STRIP_OUTER_ARRAY TRUE = separate rows [OK]
Hint: STRIP_OUTER_ARRAY TRUE splits array into rows [OK]
Common Mistakes:
  • Thinking entire array loads as one row
  • Expecting an error on outer array
  • Assuming outer array is ignored
4. You created a Snowflake file format for CSV with:
CREATE FILE FORMAT my_csv_format TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY = '"';

When loading data, some fields with commas inside quotes are split incorrectly. What is the likely issue?
medium
A. FIELD_DELIMITER is missing and defaults to tab
B. FIELD_OPTIONALLY_ENCLOSED_BY should be set to single quote instead of double quote
C. The CSV file uses a different enclosing character than specified
D. The file format type should be JSON, not CSV

Solution

  1. Step 1: Check FIELD_OPTIONALLY_ENCLOSED_BY usage

    This option tells Snowflake which character encloses fields optionally, often double quotes for CSV.
  2. Step 2: Identify mismatch with actual file

    If the CSV file uses a different enclosing character (like single quotes), Snowflake will not parse fields with commas correctly.
  3. Final Answer:

    The CSV file uses a different enclosing character than specified -> Option C
  4. Quick Check:

    Enclosing char mismatch breaks parsing [OK]
Hint: Match enclosing char exactly to file [OK]
Common Mistakes:
  • Changing enclosing char without checking file
  • Assuming FIELD_DELIMITER defaults to comma always
  • Switching file format type unnecessarily
5. You want to load a large dataset with complex nested data and efficient compression into Snowflake. Which file format should you choose and why?
hard
A. Parquet, because it supports nested data and is optimized for compression and performance
B. JSON, because it supports nested data and is human-readable
C. CSV, because it is simple and widely supported
D. Avro, because it only supports flat data but is fast

Solution

  1. Step 1: Identify requirements

    The dataset is large, has nested data, and needs efficient compression and performance.
  2. Step 2: Compare file formats

    CSV is flat and not compressed; JSON is nested but less efficient; Avro supports nested but less optimized than Parquet; Parquet supports nested data and is columnar, offering better compression and query speed.
  3. Final Answer:

    Parquet, because it supports nested data and is optimized for compression and performance -> Option A
  4. Quick Check:

    Large nested data + compression = Parquet [OK]
Hint: Large nested data? Pick Parquet for speed and size [OK]
Common Mistakes:
  • Choosing CSV for nested data
  • Preferring JSON despite compression needs
  • Misunderstanding Avro's capabilities