How to Use Outer Merge in pandas for DataFrames
Use
pandas.merge() with the argument how='outer' to combine two DataFrames and keep all rows from both, filling missing values with NaN. This method merges on specified columns or indexes, including all data from both sides.Syntax
The basic syntax for an outer merge in pandas is:
pandas.merge(left, right, how='outer', on=None, left_on=None, right_on=None)
Explanation:
left: The first DataFrame.right: The second DataFrame.how='outer': Keeps all rows from both DataFrames, filling missing matches withNaN.on: Column name(s) to join on, must be present in both DataFrames.left_onandright_on: Column names to join on if they differ between DataFrames.
python
pd.merge(left, right, how='outer', on='key')
Example
This example shows how to merge two DataFrames using an outer merge on a common column. It keeps all rows from both tables and fills missing values with NaN.
python
import pandas as pd # Create first DataFrame left = pd.DataFrame({ 'key': ['A', 'B', 'C'], 'left_val': [1, 2, 3] }) # Create second DataFrame right = pd.DataFrame({ 'key': ['B', 'C', 'D'], 'right_val': [4, 5, 6] }) # Perform outer merge result = pd.merge(left, right, how='outer', on='key') print(result)
Output
key left_val right_val
0 A 1.0 NaN
1 B 2.0 4.0
2 C 3.0 5.0
3 D NaN 6.0
Common Pitfalls
Common mistakes when using outer merge include:
- Not specifying the
onparameter when the key column names differ or are missing. - Confusing
how='outer'withhow='inner', which only keeps matching rows. - Forgetting that missing values appear as
NaNafter the merge.
Always check your key columns and understand that outer merge keeps all rows from both DataFrames.
python
import pandas as pd # Wrong: no 'on' specified when keys differ left = pd.DataFrame({'key1': ['A', 'B'], 'val1': [1, 2]}) right = pd.DataFrame({'key2': ['B', 'C'], 'val2': [3, 4]}) # This will raise an error # pd.merge(left, right, how='outer') # Correct way specifying left_on and right_on result = pd.merge(left, right, how='outer', left_on='key1', right_on='key2') print(result)
Output
key1 val1 key2 val2
0 A 1 NaN NaN
1 B 2 B 3
2 NaN NaN C 4
Quick Reference
| Parameter | Description |
|---|---|
| left | First DataFrame to merge |
| right | Second DataFrame to merge |
| how='outer' | Keep all rows from both DataFrames |
| on | Column(s) to join on (same name in both) |
| left_on | Column(s) from left DataFrame to join on |
| right_on | Column(s) from right DataFrame to join on |
Key Takeaways
Use pandas.merge() with how='outer' to keep all rows from both DataFrames.
Specify the join keys with on, or left_on and right_on if names differ.
Outer merge fills missing matches with NaN values.
Check your key columns carefully to avoid merge errors.
Outer merge is useful to combine datasets without losing any data.