Complete the code to perform a broadcast join in Spark.
result = df1.join([1](df2), on='id')
Broadcast join sends the smaller DataFrame to all worker nodes, reducing shuffle and improving performance.
Complete the code to specify the join type as a shuffle hash join.
joined_df = df1.join(df2, on='key', how=[1])
The 'inner' join type triggers a shuffle hash join by default in Spark when no broadcast is used.
Fix the error in the code to avoid a costly shuffle join by broadcasting the smaller DataFrame.
from pyspark.sql.functions import [1] joined = df1.join([1](df2), 'id')
Broadcasting the smaller DataFrame avoids shuffle and improves join performance.
Fill both blanks to create a dictionary comprehension that filters words longer than 4 characters and squares their lengths.
lengths = {word: len(word)[1]2 for word in words if len(word) [2] 4}We square the length with '**2' and filter words with length greater than 4 using '>'.
Fill all three blanks to create a dictionary comprehension that uppercases keys, keeps values, and filters positive values.
result = [1]: [2] for k, v in data.items() if v [3] 0}}
Keys are uppercased with 'k.upper()', values kept as 'v', and filtered where 'v > 0'.