Given two pandas DataFrames df1 and df2:
df1 = pd.DataFrame({"key": [1, 2, 3], "val1": ["A", "B", "C"]})
df2 = pd.DataFrame({"key": [2, 3, 4], "val2": ["X", "Y", "Z"]})
What is the result of pd.merge(df1, df2, on='key', how='outer')?
import pandas as pd df1 = pd.DataFrame({"key": [1, 2, 3], "val1": ["A", "B", "C"]}) df2 = pd.DataFrame({"key": [2, 3, 4], "val2": ["X", "Y", "Z"]}) result = pd.merge(df1, df2, on='key', how='outer') print(result.sort_values('key').reset_index(drop=True))
Outer join keeps all keys from both DataFrames, filling missing values with None.
Outer join returns all rows from both DataFrames. For keys missing in one DataFrame, the corresponding columns are filled with None.
Consider these DataFrames with duplicate keys:
df1 = pd.DataFrame({"key": [1, 2, 2], "val1": ["A", "B", "C"]})
df2 = pd.DataFrame({"key": [2, 2, 3], "val2": ["X", "Y", "Z"]})
How many rows will pd.merge(df1, df2, on='key', how='outer') have?
import pandas as pd df1 = pd.DataFrame({"key": [1, 2, 2], "val1": ["A", "B", "C"]}) df2 = pd.DataFrame({"key": [2, 2, 3], "val2": ["X", "Y", "Z"]}) result = pd.merge(df1, df2, on='key', how='outer') print(len(result))
Remember that duplicates cause a Cartesian product for matching keys.
For key=2, df1 has 2 rows and df2 has 2 rows, so 2*2=4 rows for key=2. Plus one row for key=1 and one for key=3, total 6 rows.
Choose the code that correctly performs an outer join on df1 and df2 using pandas.
Check the correct function and parameters for outer join in pandas.
Option A uses pd.merge with how='outer' and on='key', which is the correct syntax for outer join on a key column.
Option A is valid but joins on index by default, and how='outer' is valid but without specifying keys it joins on index.
Option A uses pd.concat which concatenates rows or columns, not a join.
Option A is invalid because merge method on DataFrame requires on parameter to specify join keys.
Given these DataFrames:
df1 = pd.DataFrame({"key": [1, 2], "val1": ["A", "B"]})
df2 = pd.DataFrame({"key": ["1", "2"], "val2": ["X", "Y"]})
Why does pd.merge(df1, df2, on='key', how='outer') produce NaNs in all columns except the keys?
import pandas as pd df1 = pd.DataFrame({"key": [1, 2], "val1": ["A", "B"]}) df2 = pd.DataFrame({"key": ["1", "2"], "val2": ["X", "Y"]}) result = pd.merge(df1, df2, on='key', how='outer') print(result)
Check the data types of the key columns in both DataFrames.
The keys have different types: integers in df1 and strings in df2. This causes no matches during merge, so all rows appear with NaNs in the other DataFrame's columns.
When performing an outer join with pd.merge on two DataFrames with overlapping and non-overlapping keys, how does pandas handle the index of the resulting DataFrame?
Think about how pandas resets or preserves indexes after merge.
By default, pd.merge returns a DataFrame with a new default integer index. It does not preserve the original indexes unless left_index=True and right_index=True are specified.