Given the following PySpark code snippet, what will be the output schema of the saved file?

medium📝 Predict Output Q13 of 15

Hadoop - Performance Tuning

df.write.format('orc').save('/data/output')

AData saved in ORC format with schema preserved

BData saved in CSV format without schema

CData saved in Parquet format with schema preserved

DData saved in Avro format with schema preserved

Step-by-Step Solution

Solution:

Step 1: Identify the format used in the code
The code uses df.write.format('orc'), which means data is saved in ORC format.
Step 2: Understand ORC properties
ORC stores data column-wise and preserves schema for efficient queries.
Final Answer:
Data saved in ORC format with schema preserved -> Option A
Quick Check:
ORC format with schema = Data saved in ORC format with schema preserved [OK]

Quick Trick: format('orc') means ORC with schema saved [OK]

Common Mistakes:

Master "Performance Tuning" in Hadoop

9 interactive learning modes - each teaches the same concept differently

Want More Practice?

15+ quiz questions · All difficulty levels · Free

More Hadoop Quizzes