0
0
Apache Sparkdata~10 mins

Multi-column joins in Apache Spark - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to join two DataFrames on multiple columns.

Apache Spark
joined_df = df1.join(df2, on=[1])
Drag options to blanks, or click blank then click option'
A'date'
B['id', 'date']
C'id'
Ddf2.id == df1.id
Attempts:
3 left
💡 Hint
Common Mistakes
Passing a single column name as a string when multiple columns are needed.
Using a condition expression instead of a list for the 'on' parameter.
2fill in blank
medium

Complete the code to perform an inner join on columns 'user_id' and 'order_id'.

Apache Spark
result = df_orders.join(df_users, on=[1], how='inner')
Drag options to blanks, or click blank then click option'
A'order_id'
Bdf_orders.user_id == df_users.user_id
C['user_id', 'order_id']
D'user_id'
Attempts:
3 left
💡 Hint
Common Mistakes
Using a single column name instead of a list for multiple columns.
Not specifying the join type when needed.
3fill in blank
hard

Fix the error in the join condition to correctly join on 'city' and 'state'.

Apache Spark
joined = df_a.join(df_b, (df_a.city == df_b.city) & (df_a.[1] == df_b.state))
Drag options to blanks, or click blank then click option'
Astate
Bcity
Czip
Dcountry
Attempts:
3 left
💡 Hint
Common Mistakes
Using the wrong column name on one side of the condition.
Mixing column names that don't exist in the DataFrame.
4fill in blank
hard

Fill both blanks to create a dictionary for joining on 'country' and 'city' with different column names.

Apache Spark
join_cols = { [1]: [2] }
Drag options to blanks, or click blank then click option'
A'country'
B'city'
C'country_name'
D'city_name'
Attempts:
3 left
💡 Hint
Common Mistakes
Using a list instead of a dictionary for columns with different names.
Swapping keys and values in the dictionary.
5fill in blank
hard

Fill all three blanks to create a join condition using multiple columns with different names.

Apache Spark
join_condition = (df1.[1] == df2.[2]) & (df1.[3] == df2.city)
Drag options to blanks, or click blank then click option'
Acountry
Bcountry_name
Cstate
Dstate_name
Attempts:
3 left
💡 Hint
Common Mistakes
Mixing up column names between DataFrames.
Using | instead of & for combining conditions.