How to Merge on Multiple Columns in pandas: Simple Guide
To merge pandas DataFrames on multiple columns, use the
merge() function with the on parameter set to a list of column names. This matches rows where all specified columns have the same values in both DataFrames.Syntax
The basic syntax to merge on multiple columns in pandas is:
pd.merge(left_df, right_df, on=['col1', 'col2', ...], how='inner')left_dfandright_dfare the DataFrames to merge.onis a list of column names to join on.howspecifies the type of merge:'inner'(default),'left','right', or'outer'.
This merges rows where all columns in the on list match in both DataFrames.
python
pd.merge(left_df, right_df, on=['col1', 'col2'], how='inner')
Example
This example shows how to merge two DataFrames on two columns: city and year. It combines matching rows from both DataFrames.
python
import pandas as pd # Create first DataFrame left = pd.DataFrame({ 'city': ['Austin', 'Austin', 'Dallas', 'Dallas'], 'year': [2010, 2011, 2010, 2011], 'population': [790000, 820000, 1200000, 1250000] }) # Create second DataFrame right = pd.DataFrame({ 'city': ['Austin', 'Austin', 'Dallas', 'Houston'], 'year': [2010, 2012, 2010, 2011], 'area': [300, 305, 340, 600] }) # Merge on multiple columns merged = pd.merge(left, right, on=['city', 'year'], how='inner') print(merged)
Output
city year population area
0 Austin 2010 790000 300
1 Dallas 2010 1200000 340
Common Pitfalls
Common mistakes when merging on multiple columns include:
- Using a single string instead of a list for
onwhen merging on multiple columns. - Column names not matching exactly in both DataFrames (case sensitive).
- Forgetting that
how='inner'only keeps rows with matches in both DataFrames.
Example of wrong and right usage:
python
# Wrong: passing a string instead of list for multiple columns # pd.merge(left, right, on='city,year') # This will cause an error # Right: pass a list of column names pd.merge(left, right, on=['city', 'year'])
Quick Reference
| Parameter | Description | Example |
|---|---|---|
| left_df | Left DataFrame to merge | left |
| right_df | Right DataFrame to merge | right |
| on | List of columns to join on | ['city', 'year'] |
| how | Type of merge: 'inner', 'left', 'right', 'outer' | 'inner' |
| suffixes | Suffixes for overlapping columns | ('_x', '_y') |
Key Takeaways
Use a list of column names in the 'on' parameter to merge on multiple columns.
Ensure column names match exactly in both DataFrames for correct merging.
The 'how' parameter controls which rows are kept after merging.
Passing a single string with commas instead of a list causes errors.
Check your data for missing or mismatched values to avoid unexpected results.