Complete the code to cache the DataFrame to optimize performance.
df = spark.read.csv('data.csv') df.[1]()
Caching the DataFrame stores it in memory, which speeds up repeated operations and helps prevent job failures due to repeated expensive computations.
Complete the code to repartition the DataFrame for better parallelism.
df = df.[1](10)
Repartitioning the DataFrame increases the number of partitions, improving parallel processing and reducing job failures caused by data skew or resource bottlenecks.
Fix the error in the code to avoid job failure by using the correct action.
result = df.filter(df.age > 30) result.[1]()
'show()' is an action that triggers computation and displays results. Using 'cache()' or 'persist()' alone does not trigger execution, and 'map' is a transformation, not an action.
Fill both blanks to create a dictionary of word lengths for words longer than 3 characters.
lengths = {word: [1] for word in words if len(word) [2] 3}The dictionary comprehension maps each word to its length using 'len(word)'. The condition filters words with length greater than 3 using '>'.
Fill all three blanks to create a filtered dictionary with uppercase keys for words longer than 4 characters.
result = [1]([2]: [3] for [2] in words if len([2]) > 4)
This code creates a dictionary ('dict') with keys as the original words ('word') and values as uppercase words ('word.upper()') filtered by length greater than 4.