0
0
Apache Sparkdata~10 mins

Inner, left, right, and full outer joins in Apache Spark - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to perform an inner join between two DataFrames df1 and df2 on the column 'id'.

Apache Spark
joined_df = df1.join(df2, on='id', how='[1]')
Drag options to blanks, or click blank then click option'
Ainner
Bouter
Cright
Dleft
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'left' or 'right' instead of 'inner' will include unmatched rows from one side.
Using 'outer' will include all rows from both DataFrames.
2fill in blank
medium

Complete the code to perform a left join between df1 and df2 on the column 'user_id'.

Apache Spark
joined_df = df1.join(df2, on='user_id', how='[1]')
Drag options to blanks, or click blank then click option'
Aleft
Bright
Cinner
Dfull
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'inner' will exclude unmatched left rows.
Using 'right' will keep all right rows instead.
3fill in blank
hard

Fix the error in the code to perform a right join between dfA and dfB on 'key'.

Apache Spark
result = dfA.join(dfB, on='key', how='[1]')
Drag options to blanks, or click blank then click option'
Ainner
Bleft
Cright
Douter
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'left' will keep all left rows, not right.
Using 'inner' excludes unmatched rows.
4fill in blank
hard

Fill both blanks to perform a full outer join between dfX and dfY on 'id' and select only rows where 'id' is not null.

Apache Spark
joined = dfX.join(dfY, on='id', how='[1]')
filtered = joined.filter(joined.id [2] None)
Drag options to blanks, or click blank then click option'
Aouter
B==
C!=
Dleft
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'left' or 'inner' will exclude some rows.
Filtering with '== None' keeps only null ids, which is usually unwanted.
5fill in blank
hard

Fill all three blanks to create a dictionary of user names and their ages from dfUsers where age is greater than 20.

Apache Spark
user_dict = {row['[1]']: row['[2]'] for row in dfUsers.collect() if row['[3]'] > 20}
Drag options to blanks, or click blank then click option'
Aname
Bage
Did
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'id' instead of 'name' for keys.
Filtering on 'name' instead of 'age'.