Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to write the DataFrame as a parquet file.
Apache Spark
df.write.parquet([1]) Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the DataFrame itself instead of a path.
Using the method name as a string.
Not providing any argument.
✗ Incorrect
The parquet method requires the output path as a string.
2fill in blank
mediumComplete the code to partition the output by the column 'year'.
Apache Spark
df.write.partitionBy([1]).parquet("/output/path")
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the column name without quotes.
Using the wrong column name.
Using the method name as argument.
✗ Incorrect
The partitionBy method expects the column name as a string.
3fill in blank
hardFix the error in the code to write partitioned data by 'country'.
Apache Spark
df.write.partitionBy([1]).save("/data/output")
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the column name without quotes.
Passing the method name instead of the column name.
Using the save method incorrectly.
✗ Incorrect
The partitionBy method requires the column name as a string, so it must be in quotes.
4fill in blank
hardFill both blanks to write the DataFrame partitioned by 'state' and saved as JSON.
Apache Spark
df.write.[1]By([2]).json("/json/output")
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'partition' instead of 'partitionBy'.
Not quoting the column name.
Using 'save' instead of 'json'.
✗ Incorrect
The method is partitionBy and the column name must be a string.
5fill in blank
hardFill all three blanks to write the DataFrame partitioned by 'category', saved as parquet, and overwrite existing data.
Apache Spark
df.write.mode([1]).[2]By([3]).parquet("/final/output")
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'append' mode instead of 'overwrite'.
Not using 'partitionBy' method.
Passing the column name without quotes.
✗ Incorrect
Use mode("overwrite") to overwrite data, partitionBy to partition, and the column name as a string.