Challenge - 5 Problems

🎖️

Right Join Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ query_result

intermediate

2:00remaining

What is the output of this right join?

Given two DataFrames df1 and df2, what is the result of df1.merge(df2, how='right', on='key')?

Pandas

import pandas as pd

df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'val1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'val2': [4, 5, 6]})

result = df1.merge(df2, how='right', on='key')
print(result)

  key  val1  val2
0   A   1.0     NaN
1   B   2.0     4
2   D   NaN     6

  key  val1  val2
0   A   1.0     NaN
1   B   2.0     4
2   C   3.0     5

  key  val1  val2
0   B   2.0     4
1   C   3.0     5
2   D   NaN     6

  key  val1  val2
0   B   2.0     4
1   C   3.0     5

Attempts:

2 left

🧠 Conceptual

intermediate

1:30remaining

Which statement best describes a right join?

Choose the correct description of what a right join does in pandas merge.

AReturns all rows from the left DataFrame and matching rows from the right DataFrame.

BReturns all rows from the right DataFrame and matching rows from the left DataFrame.

CReturns only rows that have matching keys in both DataFrames.

DReturns all rows from both DataFrames, filling missing matches with NaN.

Attempts:

2 left

📝 Syntax

advanced

2:30remaining

Which code correctly performs a right join on columns 'id' and 'code'?

Given two DataFrames df1 and df2, which code snippet correctly performs a right join on columns 'id' and 'code'?

Pandas

import pandas as pd

# df1 and df2 are predefined DataFrames

Adf1.merge(df2, how='right', left_on=['id', 'code'], right_on=['id', 'code'])

Bdf1.merge(df2, how='right', left_on='id', right_on='code')

Cdf1.merge(df2, how='right', on='id', on='code')

Ddf1.merge(df2, how='right', on=['id', 'code'])

Attempts:

2 left

❓ optimization

advanced

3:00remaining

How to optimize a right join when the right DataFrame is very large?

You have two DataFrames: df1 (small) and df2 (very large). You want to perform a right join. Which approach optimizes performance?

APerform a left join with <code>df2</code> as left and <code>df1</code> as right, then rename columns accordingly.

BPerform a right join directly with <code>df1.merge(df2, how='right')</code> without changes.

CConvert both DataFrames to dictionaries and merge manually in Python.

DSort both DataFrames by join keys before merging.

Attempts:

2 left

🔧 Debug

expert

3:00remaining

Why does this right join produce unexpected NaNs?

Consider this code:

df1 = pd.DataFrame({'key': ['A', 'B'], 'val1': [1, 2]})
df2 = pd.DataFrame({'key': ['a', 'b', 'c'], 'val2': [3, 4, 5]})
result = df1.merge(df2, how='right', on='key')

Why does result contain NaNs in val1 for all rows?

ABecause the keys have different cases ('A' vs 'a'), so no matches occur.

BBecause right join only keeps rows from the left DataFrame.

CBecause the 'on' parameter is missing in the merge call.

DBecause the DataFrames have different column names for the join key.

Attempts:

2 left