How to Use Inner Merge in pandas for DataFrames
Use
pandas.merge() with the parameter how='inner' to combine two DataFrames by keeping only rows with matching keys in both. This performs an inner join, returning the intersection of the data based on specified columns.Syntax
The basic syntax for an inner merge in pandas is:
left: The first DataFrame.right: The second DataFrame.on: Column name(s) to join on. Must be present in both DataFrames.how='inner': Specifies an inner join to keep only matching rows.
python
pd.merge(left, right, on='key_column', how='inner')
Example
This example shows how to merge two DataFrames on a common column using an inner join. Only rows with matching keys in both DataFrames are kept.
python
import pandas as pd # Create first DataFrame left = pd.DataFrame({ 'key': ['A', 'B', 'C', 'D'], 'value_left': [1, 2, 3, 4] }) # Create second DataFrame right = pd.DataFrame({ 'key': ['B', 'C', 'E', 'F'], 'value_right': [5, 6, 7, 8] }) # Perform inner merge on 'key' result = pd.merge(left, right, on='key', how='inner') print(result)
Output
key value_left value_right
0 B 2 5
1 C 3 6
Common Pitfalls
Common mistakes when using inner merge include:
- Not specifying the
onparameter correctly, leading to unexpected merges or errors. - Using columns with different names in each DataFrame without specifying
left_onandright_on. - Assuming inner merge keeps all rows; it only keeps rows with keys present in both DataFrames.
python
import pandas as pd # Wrong: columns have different names but 'on' is used left = pd.DataFrame({'key1': ['A', 'B'], 'val': [1, 2]}) right = pd.DataFrame({'key2': ['B', 'C'], 'val': [3, 4]}) # This will raise an error because 'key1' and 'key2' differ # pd.merge(left, right, on='key1', how='inner') # Error # Correct way: specify left_on and right_on correct_merge = pd.merge(left, right, left_on='key1', right_on='key2', how='inner') print(correct_merge)
Output
key1 val_x key2 val_y
0 B 2 B 3
Quick Reference
| Parameter | Description | Example |
|---|---|---|
| left | First DataFrame to merge | pd.merge(left, right, ...) |
| right | Second DataFrame to merge | pd.merge(left, right, ...) |
| on | Column(s) to join on (must exist in both) | 'key' |
| left_on | Column(s) from left DataFrame if names differ | 'key1' |
| right_on | Column(s) from right DataFrame if names differ | 'key2' |
| how | Type of merge: 'inner' keeps only matching rows | 'inner' |
Key Takeaways
Use pd.merge() with how='inner' to keep only rows with matching keys in both DataFrames.
Specify the 'on' parameter to define the join column(s) when names match in both DataFrames.
Use 'left_on' and 'right_on' if join columns have different names in each DataFrame.
Inner merge returns the intersection of data, excluding non-matching rows.
Always check column names and data types to avoid merge errors.