How to Use merge in pandas: Syntax, Example, and Tips
Use
pandas.merge() to combine two DataFrames by matching values in one or more columns or indices. Specify the on parameter for columns to join on, and use how to choose the type of join like 'inner', 'left', 'right', or 'outer'.Syntax
The basic syntax of pandas.merge() is:
left: The first DataFrame.right: The second DataFrame.on: Column name(s) to join on. Must be present in both DataFrames.how: Type of join - 'inner' (default), 'left', 'right', or 'outer'.left_onandright_on: Use these if the join columns have different names.suffixes: Tuple to append to overlapping column names.
python
pd.merge(left, right, on=None, how='inner', left_on=None, right_on=None, suffixes=('_x', '_y'))
Example
This example shows how to merge two DataFrames on a common column 'key' using an inner join, which keeps only matching rows.
python
import pandas as pd # Create first DataFrame df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value1': [1, 2, 3, 4]}) # Create second DataFrame df2 = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value2': [5, 6, 7, 8]}) # Merge on 'key' column with inner join merged_df = pd.merge(df1, df2, on='key', how='inner') print(merged_df)
Output
key value1 value2
0 B 2 5
1 D 4 6
Common Pitfalls
Common mistakes when using merge include:
- Not specifying
onor mismatching column names causes errors or unexpected results. - Using the wrong
howjoin type can drop rows unintentionally. - Overlapping column names without
suffixescause confusing column names.
Always check your join keys and join type carefully.
python
import pandas as pd # Wrong: columns have different names but 'on' is used left = pd.DataFrame({'key1': ['A', 'B'], 'val': [1, 2]}) right = pd.DataFrame({'key2': ['A', 'B'], 'val': [3, 4]}) # This will raise an error # pd.merge(left, right, on='key1') # Correct way: use left_on and right_on correct_merge = pd.merge(left, right, left_on='key1', right_on='key2') print(correct_merge)
Output
key1 val_x key2 val_y
0 A 1 A 3
1 B 2 B 4
Quick Reference
| Parameter | Description | Example Values |
|---|---|---|
| left | First DataFrame | df1 |
| right | Second DataFrame | df2 |
| on | Column(s) to join on | 'key' |
| how | Type of join | 'inner', 'left', 'right', 'outer' |
| left_on | Join column(s) in left DataFrame | 'key1' |
| right_on | Join column(s) in right DataFrame | 'key2' |
| suffixes | Suffixes for overlapping columns | ('_x', '_y') |
Key Takeaways
Use pandas.merge() to combine DataFrames by matching columns or indices.
Specify the join columns with 'on' or 'left_on' and 'right_on' if names differ.
Choose the join type with 'how' to control which rows appear in the result.
Check for overlapping column names and use 'suffixes' to avoid confusion.
Always verify your join keys and join type to avoid unexpected results.