0
0
Apache Sparkdata~10 mins

Handling skewed joins in Apache Spark - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to perform a join between two DataFrames on the 'id' column.

Apache Spark
result = df1.join(df2, on=[1], how='inner')
Drag options to blanks, or click blank then click option'
A'name'
B'id'
C'age'
D'date'
Attempts:
3 left
💡 Hint
Common Mistakes
Joining on a column that does not exist in both DataFrames.
Using the wrong join type.
2fill in blank
medium

Complete the code to add a salt column with random integers between 0 and 9 to the DataFrame.

Apache Spark
from pyspark.sql.functions import rand
salted_df = df.withColumn('salt', (rand() * 10).cast([1]))
Drag options to blanks, or click blank then click option'
A'integer'
B'string'
C'float'
D'boolean'
Attempts:
3 left
💡 Hint
Common Mistakes
Casting to string or float instead of integer.
Not casting at all, resulting in float values.
3fill in blank
hard

Fix the error in the code to perform a salted join by matching both 'id' and 'salt' columns.

Apache Spark
joined_df = df1.join(df2, on=['id', [1]], how='inner')
Drag options to blanks, or click blank then click option'
A'saltVal'
B'salted'
C'salt_column'
D'salt'
Attempts:
3 left
💡 Hint
Common Mistakes
Using incorrect column names for salt.
Joining only on 'id' without salt.
4fill in blank
hard

Fill both blanks to create a salted key by concatenating 'id' and 'salt' as strings.

Apache Spark
from pyspark.sql.functions import concat, col
salted_df = df.withColumn('salted_key', concat(col([1]), col([2])))
Drag options to blanks, or click blank then click option'
A'id'
B'salt'
C'key'
D'value'
Attempts:
3 left
💡 Hint
Common Mistakes
Using wrong column names like 'key' or 'value'.
Concatenating only one column.
5fill in blank
hard

Fill all three blanks to filter the DataFrame for skewed keys where count is greater than 1000.

Apache Spark
skewed_keys = df.groupBy([1]).count().filter(col('count') [2] [3])
Drag options to blanks, or click blank then click option'
A'id'
B>
C1000
D'salt'
Attempts:
3 left
💡 Hint
Common Mistakes
Filtering with wrong operators like '<'.
Grouping by wrong columns like 'salt'.