Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to join the DataFrame to itself using the 'id' column.
Apache Spark
joined_df = a.join(b, a['id'] [1] b['id'])
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using '!=' instead of '==' causes no matching rows.
Using '<' or '>' causes incorrect join conditions.
✗ Incorrect
We use '==' to join rows where the 'id' values are equal in both aliases.
2fill in blank
mediumComplete the code to select columns 'a.id' and 'b.value' after the self join.
Apache Spark
result = joined_df.select([1]) Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Passing a string with comma-separated columns instead of a list.
Using column names without alias prefixes.
✗ Incorrect
The select method expects a list of column names as strings.
3fill in blank
hardFix the error in the join condition to avoid ambiguous column references.
Apache Spark
joined_df = x.join(y, x['id'] [1] )
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using df['id'] on both sides causes ambiguity.
Using '!=' instead of '==' causes wrong join results.
✗ Incorrect
We must specify the alias for the right side column to avoid ambiguity.
4fill in blank
hardFill both blanks to create a self join where 'a.manager_id' equals 'b.id'.
Apache Spark
joined_df = a.join(b, a['[1]'] [2] b['id'])
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using '!=' instead of '==' causes no matches.
Using wrong column names like 'employee_id'.
✗ Incorrect
We join on 'manager_id' from alias 'a' equal to 'id' from alias 'b'.
5fill in blank
hardFill all three blanks to create a dictionary of employee names and their manager names.
Apache Spark
emp_mgr = {row['[1]']: row['[2]'] for row in joined_df.select('[3]', 'manager_name').collect()} Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'id' instead of 'employee_name' as key.
Swapping key and value columns.
✗ Incorrect
We map employee_name to manager_name using the selected columns.