Given two DataFrames df1 and df2, what is the result of df1.merge(df2, how='right', on='key')?
import pandas as pd df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'val1': [1, 2, 3]}) df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'val2': [4, 5, 6]}) result = df1.merge(df2, how='right', on='key') print(result)
Remember, a right join keeps all rows from the right DataFrame and matches rows from the left.
The right join keeps all keys from df2. Keys 'B', 'C', and 'D' are in df2. For 'D', there is no match in df1, so val1 is NaN.
Choose the correct description of what a right join does in pandas merge.
Think about which DataFrame's rows are always kept in a right join.
A right join keeps all rows from the right DataFrame and matches rows from the left DataFrame where keys match. Unmatched left rows are dropped.
Given two DataFrames df1 and df2, which code snippet correctly performs a right join on columns 'id' and 'code'?
import pandas as pd # df1 and df2 are predefined DataFrames
Check the correct syntax for joining on multiple columns with the same names.
When joining on multiple columns with the same names in both DataFrames, use on=[...]. Using left_on and right_on is for different column names.
You have two DataFrames: df1 (small) and df2 (very large). You want to perform a right join. Which approach optimizes performance?
Think about which DataFrame should be on the left for better performance.
Left joins are generally faster when the left DataFrame is large. Since df2 is large, making it the left DataFrame and performing a left join is more efficient than a right join.
Consider this code:
df1 = pd.DataFrame({'key': ['A', 'B'], 'val1': [1, 2]})
df2 = pd.DataFrame({'key': ['a', 'b', 'c'], 'val2': [3, 4, 5]})
result = df1.merge(df2, how='right', on='key')Why does result contain NaNs in val1 for all rows?
Check if the join keys match exactly including case sensitivity.
Join keys are case sensitive. 'A' is not equal to 'a', so no rows match. The right join keeps all rows from df2, but val1 is NaN because no matching keys in df1.