Bird
0
0

Given the following PySpark code snippet, what will be the output schema of the saved file?

medium📝 Predict Output Q13 of 15
Hadoop - Performance Tuning
Given the following PySpark code snippet, what will be the output schema of the saved file?
df.write.format('orc').save('/data/output')
AData saved in ORC format with schema preserved
BData saved in CSV format without schema
CData saved in Parquet format with schema preserved
DData saved in Avro format with schema preserved
Step-by-Step Solution
Solution:
  1. Step 1: Identify the format used in the code

    The code uses df.write.format('orc'), which means data is saved in ORC format.
  2. Step 2: Understand ORC properties

    ORC stores data column-wise and preserves schema for efficient queries.
  3. Final Answer:

    Data saved in ORC format with schema preserved -> Option A
  4. Quick Check:

    ORC format with schema = Data saved in ORC format with schema preserved [OK]
Quick Trick: format('orc') means ORC with schema saved [OK]
Common Mistakes:
  • Assuming default is CSV
  • Confusing ORC with Parquet or Avro
  • Ignoring schema preservation in ORC

Want More Practice?

15+ quiz questions · All difficulty levels · Free

Free Signup - Practice All Questions
More Hadoop Quizzes