0
0
PandasHow-ToBeginner · 3 min read

How to Use Outer Merge in pandas for DataFrames

Use pandas.merge() with the argument how='outer' to combine two DataFrames and keep all rows from both, filling missing values with NaN. This method merges on specified columns or indexes, including all data from both sides.
📐

Syntax

The basic syntax for an outer merge in pandas is:

  • pandas.merge(left, right, how='outer', on=None, left_on=None, right_on=None)

Explanation:

  • left: The first DataFrame.
  • right: The second DataFrame.
  • how='outer': Keeps all rows from both DataFrames, filling missing matches with NaN.
  • on: Column name(s) to join on, must be present in both DataFrames.
  • left_on and right_on: Column names to join on if they differ between DataFrames.
python
pd.merge(left, right, how='outer', on='key')
💻

Example

This example shows how to merge two DataFrames using an outer merge on a common column. It keeps all rows from both tables and fills missing values with NaN.

python
import pandas as pd

# Create first DataFrame
left = pd.DataFrame({
    'key': ['A', 'B', 'C'],
    'left_val': [1, 2, 3]
})

# Create second DataFrame
right = pd.DataFrame({
    'key': ['B', 'C', 'D'],
    'right_val': [4, 5, 6]
})

# Perform outer merge
result = pd.merge(left, right, how='outer', on='key')
print(result)
Output
key left_val right_val 0 A 1.0 NaN 1 B 2.0 4.0 2 C 3.0 5.0 3 D NaN 6.0
⚠️

Common Pitfalls

Common mistakes when using outer merge include:

  • Not specifying the on parameter when the key column names differ or are missing.
  • Confusing how='outer' with how='inner', which only keeps matching rows.
  • Forgetting that missing values appear as NaN after the merge.

Always check your key columns and understand that outer merge keeps all rows from both DataFrames.

python
import pandas as pd

# Wrong: no 'on' specified when keys differ
left = pd.DataFrame({'key1': ['A', 'B'], 'val1': [1, 2]})
right = pd.DataFrame({'key2': ['B', 'C'], 'val2': [3, 4]})

# This will raise an error
# pd.merge(left, right, how='outer')

# Correct way specifying left_on and right_on
result = pd.merge(left, right, how='outer', left_on='key1', right_on='key2')
print(result)
Output
key1 val1 key2 val2 0 A 1 NaN NaN 1 B 2 B 3 2 NaN NaN C 4
📊

Quick Reference

ParameterDescription
leftFirst DataFrame to merge
rightSecond DataFrame to merge
how='outer'Keep all rows from both DataFrames
onColumn(s) to join on (same name in both)
left_onColumn(s) from left DataFrame to join on
right_onColumn(s) from right DataFrame to join on

Key Takeaways

Use pandas.merge() with how='outer' to keep all rows from both DataFrames.
Specify the join keys with on, or left_on and right_on if names differ.
Outer merge fills missing matches with NaN values.
Check your key columns carefully to avoid merge errors.
Outer merge is useful to combine datasets without losing any data.