0
0
Apache Sparkdata~10 mins

Self joins in Apache Spark - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to join the DataFrame to itself using the 'id' column.

Apache Spark
joined_df = a.join(b, a['id'] [1] b['id'])
Drag options to blanks, or click blank then click option'
A!=
B==
C<
D>
Attempts:
3 left
💡 Hint
Common Mistakes
Using '!=' instead of '==' causes no matching rows.
Using '<' or '>' causes incorrect join conditions.
2fill in blank
medium

Complete the code to select columns 'a.id' and 'b.value' after the self join.

Apache Spark
result = joined_df.select([1])
Drag options to blanks, or click blank then click option'
A['a.id', 'b.value']
Ba.id, b.value
C['id', 'value']
D['a.id', 'value']
Attempts:
3 left
💡 Hint
Common Mistakes
Passing a string with comma-separated columns instead of a list.
Using column names without alias prefixes.
3fill in blank
hard

Fix the error in the join condition to avoid ambiguous column references.

Apache Spark
joined_df = x.join(y, x['id'] [1] )
Drag options to blanks, or click blank then click option'
A== y['id']
B==
C!=
D== x['id']
Attempts:
3 left
💡 Hint
Common Mistakes
Using df['id'] on both sides causes ambiguity.
Using '!=' instead of '==' causes wrong join results.
4fill in blank
hard

Fill both blanks to create a self join where 'a.manager_id' equals 'b.id'.

Apache Spark
joined_df = a.join(b, a['[1]'] [2] b['id'])
Drag options to blanks, or click blank then click option'
Amanager_id
B==
C!=
Demployee_id
Attempts:
3 left
💡 Hint
Common Mistakes
Using '!=' instead of '==' causes no matches.
Using wrong column names like 'employee_id'.
5fill in blank
hard

Fill all three blanks to create a dictionary of employee names and their manager names.

Apache Spark
emp_mgr = {row['[1]']: row['[2]'] for row in joined_df.select('[3]', 'manager_name').collect()}
Drag options to blanks, or click blank then click option'
Aemployee_name
Bmanager_name
Did
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'id' instead of 'employee_name' as key.
Swapping key and value columns.