Bird
0
0

You have a large dataset stored in ORC format and want to convert it to Parquet format using PySpark. Which sequence of commands correctly performs this?

hard📝 Application Q8 of 15
Hadoop - Performance Tuning
You have a large dataset stored in ORC format and want to convert it to Parquet format using PySpark. Which sequence of commands correctly performs this?
Adf = spark.read.parquet('/data/orc_files'); df.write.orc('/data/parquet_files')
Bdf = spark.read.orc('/data/orc_files'); df.write.parquet('/data/parquet_files')
Cdf = spark.read.csv('/data/orc_files'); df.write.format('parquet').save('/data/parquet_files')
Ddf = spark.read.orc('/data/orc_files'); df.save.parquet('/data/parquet_files')
Step-by-Step Solution
Solution:
  1. Step 1: Read ORC files correctly

    Use spark.read.orc() with the ORC file path to load data into a DataFrame.
  2. Step 2: Write DataFrame as Parquet

    Use df.write.parquet() with the target path to save data in Parquet format.
  3. Final Answer:

    df = spark.read.orc('/data/orc_files'); df.write.parquet('/data/parquet_files') -> Option B
  4. Quick Check:

    Read ORC then write Parquet = df = spark.read.orc('/data/orc_files'); df.write.parquet('/data/parquet_files') [OK]
Quick Trick: Read ORC, then write Parquet with correct methods [OK]
Common Mistakes:
  • Mixing read/write formats
  • Using csv() for ORC files
  • Wrong method chaining

Want More Practice?

15+ quiz questions · All difficulty levels · Free

Free Signup - Practice All Questions
More Hadoop Quizzes