0
0
PandasHow-ToBeginner · 3 min read

How to Use merge in pandas: Syntax, Example, and Tips

Use pandas.merge() to combine two DataFrames by matching values in one or more columns or indices. Specify the on parameter for columns to join on, and use how to choose the type of join like 'inner', 'left', 'right', or 'outer'.
📐

Syntax

The basic syntax of pandas.merge() is:

  • left: The first DataFrame.
  • right: The second DataFrame.
  • on: Column name(s) to join on. Must be present in both DataFrames.
  • how: Type of join - 'inner' (default), 'left', 'right', or 'outer'.
  • left_on and right_on: Use these if the join columns have different names.
  • suffixes: Tuple to append to overlapping column names.
python
pd.merge(left, right, on=None, how='inner', left_on=None, right_on=None, suffixes=('_x', '_y'))
💻

Example

This example shows how to merge two DataFrames on a common column 'key' using an inner join, which keeps only matching rows.

python
import pandas as pd

# Create first DataFrame
df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value1': [1, 2, 3, 4]})

# Create second DataFrame
df2 = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value2': [5, 6, 7, 8]})

# Merge on 'key' column with inner join
merged_df = pd.merge(df1, df2, on='key', how='inner')

print(merged_df)
Output
key value1 value2 0 B 2 5 1 D 4 6
⚠️

Common Pitfalls

Common mistakes when using merge include:

  • Not specifying on or mismatching column names causes errors or unexpected results.
  • Using the wrong how join type can drop rows unintentionally.
  • Overlapping column names without suffixes cause confusing column names.

Always check your join keys and join type carefully.

python
import pandas as pd

# Wrong: columns have different names but 'on' is used
left = pd.DataFrame({'key1': ['A', 'B'], 'val': [1, 2]})
right = pd.DataFrame({'key2': ['A', 'B'], 'val': [3, 4]})

# This will raise an error
# pd.merge(left, right, on='key1')

# Correct way: use left_on and right_on
correct_merge = pd.merge(left, right, left_on='key1', right_on='key2')
print(correct_merge)
Output
key1 val_x key2 val_y 0 A 1 A 3 1 B 2 B 4
📊

Quick Reference

ParameterDescriptionExample Values
leftFirst DataFramedf1
rightSecond DataFramedf2
onColumn(s) to join on'key'
howType of join'inner', 'left', 'right', 'outer'
left_onJoin column(s) in left DataFrame'key1'
right_onJoin column(s) in right DataFrame'key2'
suffixesSuffixes for overlapping columns('_x', '_y')

Key Takeaways

Use pandas.merge() to combine DataFrames by matching columns or indices.
Specify the join columns with 'on' or 'left_on' and 'right_on' if names differ.
Choose the join type with 'how' to control which rows appear in the result.
Check for overlapping column names and use 'suffixes' to avoid confusion.
Always verify your join keys and join type to avoid unexpected results.