Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is CSV file format?
CSV stands for Comma-Separated Values. It stores data in plain text where each line is a row and columns are separated by commas. It's simple and easy to read but does not support complex data types.
Click to reveal answer
beginner
How does JSON format differ from CSV?
JSON (JavaScript Object Notation) stores data as nested objects and arrays. It supports complex and hierarchical data unlike CSV which is flat. JSON is human-readable and widely used for data exchange.
Click to reveal answer
intermediate
What is Parquet file format and why is it used?
Parquet is a columnar storage file format optimized for big data processing. It stores data by columns which makes queries faster and reduces storage space. It is efficient for analytics workloads.
Click to reveal answer
intermediate
Explain Avro file format in simple terms.
Avro is a compact, fast binary file format that stores data with its schema. It is good for data serialization and supports schema evolution, meaning the data structure can change over time without breaking compatibility.
Click to reveal answer
intermediate
Which file formats are best suited for analytics in Snowflake?
Parquet and Avro are best for analytics because they are efficient and support complex data types. CSV and JSON are easier to use but less efficient for large-scale analytics.
Click to reveal answer
Which file format stores data in plain text with columns separated by commas?
ACSV
BJSON
CParquet
DAvro
✗ Incorrect
CSV files store data as plain text with commas separating columns.
Which file format supports nested objects and arrays?
ACSV
BParquet
CJSON
DAvro
✗ Incorrect
JSON supports nested objects and arrays, unlike CSV.
What is a key advantage of Parquet format?
AIt is columnar and efficient for analytics
BIt stores data in rows
CIt is human-readable
DIt uses XML schema
✗ Incorrect
Parquet stores data by columns, making it efficient for analytics.
Which file format includes its schema with the data for compatibility?
AJSON
BAvro
CParquet
DCSV
✗ Incorrect
Avro stores data with its schema, supporting schema evolution.
For large-scale analytics in Snowflake, which formats are preferred?
ACSV and JSON
BOnly JSON
COnly CSV
DParquet and Avro
✗ Incorrect
Parquet and Avro are preferred for analytics due to efficiency and support for complex data.
Describe the main differences between CSV, JSON, Parquet, and Avro file formats.
Think about data structure, readability, and efficiency.
You got /4 concepts.
Explain why Parquet and Avro are better choices than CSV and JSON for analytics in Snowflake.
Focus on performance and data complexity.
You got /4 concepts.
Practice
(1/5)
1. Which file format in Snowflake is best suited for storing hierarchical data with nested structures?
easy
A. Avro
B. JSON
C. Parquet
D. CSV
Solution
Step 1: Understand file format characteristics
JSON supports nested and hierarchical data structures naturally, unlike CSV which is flat.
Step 2: Compare JSON with other formats
Parquet and Avro also support nested data but JSON is most commonly used for hierarchical data due to its readability and flexibility.
Final Answer:
JSON -> Option B
Quick Check:
Hierarchical data = JSON [OK]
Hint: Nested data? Think JSON first [OK]
Common Mistakes:
Choosing CSV for nested data
Confusing Parquet with JSON for readability
Assuming Avro is always best for nested data
2. Which Snowflake file format option correctly specifies that the CSV file uses a semicolon as the field delimiter?
easy
A. FIELD_DELIMITER = ';'
B. FIELD_DELIMITER = ','
C. FIELD_DELIMITER = ':'
D. FIELD_DELIMITER = '|'
Solution
Step 1: Identify the delimiter option for CSV in Snowflake
Snowflake uses FIELD_DELIMITER to specify the character separating fields in CSV files.
Step 2: Match the semicolon delimiter
The semicolon character is ';', so FIELD_DELIMITER = ';' is correct.
Final Answer:
FIELD_DELIMITER = ';' -> Option A
Quick Check:
Semicolon delimiter = FIELD_DELIMITER ';' [OK]
Hint: Delimiter option is FIELD_DELIMITER [OK]
Common Mistakes:
Using comma instead of semicolon
Confusing FIELD_DELIMITER with RECORD_DELIMITER
Using wrong delimiter characters
3. Given this Snowflake file format definition for JSON:
CREATE FILE FORMAT my_json_format TYPE = 'JSON' STRIP_OUTER_ARRAY = TRUE;
What happens when you load a JSON file containing an outer array of objects?
medium
A. Snowflake loads the entire array as a single row
B. Snowflake ignores the outer array and loads nothing
C. Snowflake throws an error due to the outer array
D. Snowflake loads each object inside the outer array as a separate row
Solution
Step 1: Understand STRIP_OUTER_ARRAY option
This option tells Snowflake to treat each element inside the outer JSON array as a separate record.
Step 2: Apply to loading behavior
When loading, Snowflake will parse the outer array and load each object inside it as its own row.
Final Answer:
Snowflake loads each object inside the outer array as a separate row -> Option D
Quick Check:
STRIP_OUTER_ARRAY TRUE = separate rows [OK]
Hint: STRIP_OUTER_ARRAY TRUE splits array into rows [OK]
Common Mistakes:
Thinking entire array loads as one row
Expecting an error on outer array
Assuming outer array is ignored
4. You created a Snowflake file format for CSV with:
CREATE FILE FORMAT my_csv_format TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY = '"';
When loading data, some fields with commas inside quotes are split incorrectly. What is the likely issue?
medium
A. FIELD_DELIMITER is missing and defaults to tab
B. FIELD_OPTIONALLY_ENCLOSED_BY should be set to single quote instead of double quote
C. The CSV file uses a different enclosing character than specified
D. The file format type should be JSON, not CSV
Solution
Step 1: Check FIELD_OPTIONALLY_ENCLOSED_BY usage
This option tells Snowflake which character encloses fields optionally, often double quotes for CSV.
Step 2: Identify mismatch with actual file
If the CSV file uses a different enclosing character (like single quotes), Snowflake will not parse fields with commas correctly.
Final Answer:
The CSV file uses a different enclosing character than specified -> Option C
Quick Check:
Enclosing char mismatch breaks parsing [OK]
Hint: Match enclosing char exactly to file [OK]
Common Mistakes:
Changing enclosing char without checking file
Assuming FIELD_DELIMITER defaults to comma always
Switching file format type unnecessarily
5. You want to load a large dataset with complex nested data and efficient compression into Snowflake. Which file format should you choose and why?
hard
A. Parquet, because it supports nested data and is optimized for compression and performance
B. JSON, because it supports nested data and is human-readable
C. CSV, because it is simple and widely supported
D. Avro, because it only supports flat data but is fast
Solution
Step 1: Identify requirements
The dataset is large, has nested data, and needs efficient compression and performance.
Step 2: Compare file formats
CSV is flat and not compressed; JSON is nested but less efficient; Avro supports nested but less optimized than Parquet; Parquet supports nested data and is columnar, offering better compression and query speed.
Final Answer:
Parquet, because it supports nested data and is optimized for compression and performance -> Option A
Quick Check:
Large nested data + compression = Parquet [OK]
Hint: Large nested data? Pick Parquet for speed and size [OK]