0
0
Hadoopdata~10 mins

Data serialization (Avro, Parquet, ORC) in Hadoop - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to specify the Parquet file format when saving a DataFrame in Spark.

Hadoop
df.write.format("[1]").save("/data/output")
Drag options to blanks, or click blank then click option'
Acsv
Bjson
Ctext
Dparquet
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'csv' or 'json' instead of 'parquet' will save in the wrong format.
Forgetting to specify the format causes default saving behavior.
2fill in blank
medium

Complete the code to read an Avro file into a Spark DataFrame.

Hadoop
spark.read.format("[1]").load("/data/input.avro")
Drag options to blanks, or click blank then click option'
Aavro
Bparquet
Corc
Dcsv
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'parquet' or 'orc' format to read Avro files causes errors.
Not having the Avro package installed in Spark can cause failures.
3fill in blank
hard

Fix the error in the code to write a DataFrame in ORC format.

Hadoop
df.write.[1]("/data/output.orc")
Drag options to blanks, or click blank then click option'
AsaveAsTextFile
Bsave
Corc
DsaveAsOrcFile
Attempts:
3 left
💡 Hint
Common Mistakes
Using saveAsTextFile for ORC files causes wrong output.
Using non-existent methods like saveAsOrcFile causes errors.
4fill in blank
hard

Fill both blanks to create a DataFrame from a Parquet file and select only the 'name' column.

Hadoop
df = spark.read.[1]("/data/users.parquet").select("[2]")
Drag options to blanks, or click blank then click option'
Aparquet
Bavro
Cname
Dage
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'avro' format to read Parquet files causes errors.
Selecting a wrong column name returns empty or errors.
5fill in blank
hard

Fill all three blanks to write a DataFrame in Avro format with overwrite mode and save to '/data/avro_output'.

Hadoop
df.write.mode("[1]").format("[2]").save("[3]")
Drag options to blanks, or click blank then click option'
Aappend
Boverwrite
Cavro
D/data/avro_output
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'append' mode overwrites incorrectly.
Using wrong format causes save errors.
Saving to wrong path causes confusion.