0
0
PandasDebug / FixBeginner · 3 min read

How to Handle Duplicate Columns Merge in pandas

When merging DataFrames in pandas, duplicate columns appear if both have columns with the same name. Use the suffixes parameter in pd.merge() to add suffixes to overlapping columns or select only needed columns before merging to avoid duplicates.
🔍

Why This Happens

Duplicate columns appear when you merge two DataFrames that share column names other than the key columns. pandas keeps both columns by adding suffixes like _x and _y by default, which can be confusing or unwanted.

python
import pandas as pd

df1 = pd.DataFrame({
    'id': [1, 2, 3],
    'value': ['A', 'B', 'C']
})

df2 = pd.DataFrame({
    'id': [1, 2, 3],
    'value': ['D', 'E', 'F']
})

merged = pd.merge(df1, df2, on='id')
print(merged)
Output
id value_x value_y 0 1 A D 1 2 B E 2 3 C F
🔧

The Fix

To handle duplicate columns, specify the suffixes parameter in pd.merge() to control how overlapping columns are renamed. Alternatively, select only the columns you want from one DataFrame before merging to avoid duplicates.

python
import pandas as pd

df1 = pd.DataFrame({
    'id': [1, 2, 3],
    'value': ['A', 'B', 'C']
})

df2 = pd.DataFrame({
    'id': [1, 2, 3],
    'value': ['D', 'E', 'F']
})

# Using suffixes to rename duplicate columns
merged = pd.merge(df1, df2, on='id', suffixes=('_left', '_right'))
print(merged)

# Or select only needed columns from df2
merged_select = pd.merge(df1, df2[['id', 'value']], on='id')
print(merged_select)
Output
id value_left value_right 0 1 A D 1 2 B E 2 3 C F id value 0 1 D 1 2 E 2 3 F
🛡️

Prevention

To avoid duplicate columns in merges, always check your DataFrames for overlapping column names before merging. Use the suffixes parameter to control column names or rename columns beforehand. Selecting only necessary columns before merging keeps your data clean and easier to work with.

⚠️

Related Errors

Other common issues include:

  • KeyError: Happens if the merge key column is missing in one DataFrame.
  • ValueError: columns overlap but no suffix specified: Occurs when you merge with validate='one_to_one' and duplicate columns exist without suffixes.
  • Unexpected NaNs: Can appear if merge keys do not match exactly.

Key Takeaways

Use the suffixes parameter in pd.merge() to rename duplicate columns clearly.
Select only needed columns before merging to avoid duplicates.
Check for overlapping column names before merging to prevent confusion.
Rename columns beforehand if you want custom names instead of suffixes.
Understand merge keys to avoid unexpected missing data or errors.