0
0
Apache Sparkdata~10 mins

Why data format affects performance in Apache Spark - Test Your Understanding

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to read a Parquet file into a Spark DataFrame.

Apache Spark
df = spark.read.[1]("data.parquet")
Drag options to blanks, or click blank then click option'
Aparquet
Bcsv
Cjson
Dtext
Attempts:
3 left
💡 Hint
Common Mistakes
Using csv or json to read a parquet file causes errors or slow performance.
Using text format reads data as plain text, losing structure.
2fill in blank
medium

Complete the code to write a DataFrame in an efficient columnar format.

Apache Spark
df.write.[1]("output_path")
Drag options to blanks, or click blank then click option'
Aparquet
Bcsv
Cjson
Dtext
Attempts:
3 left
💡 Hint
Common Mistakes
Writing as csv or json can slow down later reads.
Text format loses data structure and is inefficient.
3fill in blank
hard

Fix the error in the code to read a JSON file with Spark.

Apache Spark
df = spark.read.[1]("data.json")
Drag options to blanks, or click blank then click option'
Aparquet
Bcsv
Ctext
Djson
Attempts:
3 left
💡 Hint
Common Mistakes
Using parquet or csv to read JSON files causes errors.
Using text reads raw strings, losing JSON structure.
4fill in blank
hard

Fill both blanks to create a dictionary of word lengths for words longer than 3 characters.

Apache Spark
lengths = {word: [1] for word in words if len(word) [2] 3}
Drag options to blanks, or click blank then click option'
Alen(word)
B<=
C>
Dword
Attempts:
3 left
💡 Hint
Common Mistakes
Using the word itself as value instead of its length.
Using <= instead of > in the condition.
5fill in blank
hard

Fill all three blanks to create a dictionary with uppercase keys and values greater than zero.

Apache Spark
result = { [1]: [2] for k, v in data.items() if v [3] 0 }
Drag options to blanks, or click blank then click option'
Ak.upper()
Bv
C>
Dk
Attempts:
3 left
💡 Hint
Common Mistakes
Using original keys instead of uppercase.
Using <= or < instead of > in the condition.
Swapping keys and values.