Complete the code to read data from a Hadoop data lake using Spark.
df = spark.read.format([1]).load("/data/lake/path")
The parquet format is commonly used in data lakes for efficient storage and querying.
Complete the code to filter data for a specific year in the data lake.
filtered_df = df.filter(df.year == [1])The year column is numeric, so the filter value should be an integer without quotes.
Fix the error in the code to write data back to the data lake in parquet format.
filtered_df.write.mode([1]).format("parquet").save("/data/lake/output")
The write mode 'append' allows adding data without overwriting existing files.
Fill both blanks to create a dictionary of word lengths for words longer than 3 characters.
lengths = {word: [1] for word in words if len(word) [2] 3}The dictionary comprehension maps each word to its length only if the word length is greater than 3.
Fill all three blanks to create a filtered dictionary with uppercase keys and values greater than 0.
result = [1]: [2] for k, v in data.items() if v [3] 0}
This comprehension creates a dictionary with keys in uppercase and includes only items with values greater than zero.