0
0
PandasHow-ToBeginner · 3 min read

How to Merge on Multiple Columns in pandas: Simple Guide

To merge pandas DataFrames on multiple columns, use the merge() function with the on parameter set to a list of column names. This matches rows where all specified columns have the same values in both DataFrames.
📐

Syntax

The basic syntax to merge on multiple columns in pandas is:

pd.merge(left_df, right_df, on=['col1', 'col2', ...], how='inner')
  • left_df and right_df are the DataFrames to merge.
  • on is a list of column names to join on.
  • how specifies the type of merge: 'inner' (default), 'left', 'right', or 'outer'.

This merges rows where all columns in the on list match in both DataFrames.

python
pd.merge(left_df, right_df, on=['col1', 'col2'], how='inner')
💻

Example

This example shows how to merge two DataFrames on two columns: city and year. It combines matching rows from both DataFrames.

python
import pandas as pd

# Create first DataFrame
left = pd.DataFrame({
    'city': ['Austin', 'Austin', 'Dallas', 'Dallas'],
    'year': [2010, 2011, 2010, 2011],
    'population': [790000, 820000, 1200000, 1250000]
})

# Create second DataFrame
right = pd.DataFrame({
    'city': ['Austin', 'Austin', 'Dallas', 'Houston'],
    'year': [2010, 2012, 2010, 2011],
    'area': [300, 305, 340, 600]
})

# Merge on multiple columns
merged = pd.merge(left, right, on=['city', 'year'], how='inner')
print(merged)
Output
city year population area 0 Austin 2010 790000 300 1 Dallas 2010 1200000 340
⚠️

Common Pitfalls

Common mistakes when merging on multiple columns include:

  • Using a single string instead of a list for on when merging on multiple columns.
  • Column names not matching exactly in both DataFrames (case sensitive).
  • Forgetting that how='inner' only keeps rows with matches in both DataFrames.

Example of wrong and right usage:

python
# Wrong: passing a string instead of list for multiple columns
# pd.merge(left, right, on='city,year')  # This will cause an error

# Right: pass a list of column names
pd.merge(left, right, on=['city', 'year'])
📊

Quick Reference

ParameterDescriptionExample
left_dfLeft DataFrame to mergeleft
right_dfRight DataFrame to mergeright
onList of columns to join on['city', 'year']
howType of merge: 'inner', 'left', 'right', 'outer''inner'
suffixesSuffixes for overlapping columns('_x', '_y')

Key Takeaways

Use a list of column names in the 'on' parameter to merge on multiple columns.
Ensure column names match exactly in both DataFrames for correct merging.
The 'how' parameter controls which rows are kept after merging.
Passing a single string with commas instead of a list causes errors.
Check your data for missing or mismatched values to avoid unexpected results.