Complete the code to perform a join between two DataFrames on the 'id' column.
result = df1.join(df2, on=[1], how='inner')
We join on the 'id' column because it is the common key between the two DataFrames.
Complete the code to add a salt column with random integers between 0 and 9 to the DataFrame.
from pyspark.sql.functions import rand salted_df = df.withColumn('salt', (rand() * 10).cast([1]))
We cast the salt column to 'integer' to get whole numbers for salting.
Fix the error in the code to perform a salted join by matching both 'id' and 'salt' columns.
joined_df = df1.join(df2, on=['id', [1]], how='inner')
The salt column is named 'salt' in both DataFrames and must be used in the join condition.
Fill both blanks to create a salted key by concatenating 'id' and 'salt' as strings.
from pyspark.sql.functions import concat, col salted_df = df.withColumn('salted_key', concat(col([1]), col([2])))
We concatenate 'id' and 'salt' columns to create a unique salted key for joining.
Fill all three blanks to filter the DataFrame for skewed keys where count is greater than 1000.
skewed_keys = df.groupBy([1]).count().filter(col('count') [2] [3])
We group by 'id', then filter where count is greater than 1000 to find skewed keys.