How to Use Merge with Indicator in pandas for Data Comparison
Use
pandas.merge() with the indicator=True parameter to add a special column that shows the origin of each row in the merged DataFrame. This column helps identify if a row came from the left DataFrame, right DataFrame, or both.Syntax
The merge() function combines two DataFrames based on common columns or indices. Adding indicator=True creates a new column that shows the source of each row after merging.
left: The first DataFrame.right: The second DataFrame.how: Type of merge - 'left', 'right', 'inner', or 'outer'.on: Column(s) to join on.indicator=True: Adds a column showing merge origin.indicatororindicator_name(optional): Custom name for the indicator column.
python
pd.merge(left, right, how='inner', on=None, indicator=True)
Example
This example shows how to merge two DataFrames and use the indicator column to see which rows come from which DataFrame.
python
import pandas as pd # Create two sample DataFrames df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value1': [1, 2, 3]}) df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value2': [4, 5, 6]}) # Merge with indicator merged = pd.merge(df1, df2, on='key', how='outer', indicator=True) print(merged)
Output
key value1 value2 _merge
0 A 1.0 NaN left_only
1 B 2.0 4.0 both
2 C 3.0 5.0 both
3 D NaN 6.0 right_only
Common Pitfalls
One common mistake is forgetting to set indicator=True, so you don't get the source column. Another is not specifying the how parameter correctly, which affects which rows appear.
Also, if you want a custom name for the indicator column, you must use indicator='your_name'. Otherwise, it defaults to _merge.
python
import pandas as pd df1 = pd.DataFrame({'key': ['A', 'B'], 'val1': [1, 2]}) df2 = pd.DataFrame({'key': ['B', 'C'], 'val2': [3, 4]}) # Wrong: no indicator merged_wrong = pd.merge(df1, df2, on='key', how='outer') print('Without indicator:') print(merged_wrong) # Right: with indicator merged_right = pd.merge(df1, df2, on='key', how='outer', indicator=True) print('\nWith indicator:') print(merged_right)
Output
Without indicator:
key val1 val2
0 A 1.0 NaN
1 B 2.0 3.0
2 C NaN 4.0
With indicator:
key val1 val2 _merge
0 A 1.0 NaN left_only
1 B 2.0 3.0 both
2 C NaN 4.0 right_only
Quick Reference
| Parameter | Description |
|---|---|
| left | First DataFrame to merge |
| right | Second DataFrame to merge |
| how | 'left', 'right', 'inner', or 'outer' join type |
| on | Column(s) to join on |
| indicator | Set to True to add source column |
| indicator_name | Custom name for the indicator column (default '_merge') |
Key Takeaways
Use indicator=True in pd.merge() to add a column showing the source of each row.
The indicator column values are 'left_only', 'right_only', or 'both' depending on row origin.
Specify how='outer' to see all rows from both DataFrames with their source.
You can rename the indicator column with indicator parameter.
Without indicator=True, you won't know which DataFrame each row came from after merging.