0
0
PandasHow-ToBeginner · 3 min read

How to Use Merge with Indicator in pandas for Data Comparison

Use pandas.merge() with the indicator=True parameter to add a special column that shows the origin of each row in the merged DataFrame. This column helps identify if a row came from the left DataFrame, right DataFrame, or both.
📐

Syntax

The merge() function combines two DataFrames based on common columns or indices. Adding indicator=True creates a new column that shows the source of each row after merging.

  • left: The first DataFrame.
  • right: The second DataFrame.
  • how: Type of merge - 'left', 'right', 'inner', or 'outer'.
  • on: Column(s) to join on.
  • indicator=True: Adds a column showing merge origin.
  • indicator or indicator_name (optional): Custom name for the indicator column.
python
pd.merge(left, right, how='inner', on=None, indicator=True)
💻

Example

This example shows how to merge two DataFrames and use the indicator column to see which rows come from which DataFrame.

python
import pandas as pd

# Create two sample DataFrames
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value2': [4, 5, 6]})

# Merge with indicator
merged = pd.merge(df1, df2, on='key', how='outer', indicator=True)
print(merged)
Output
key value1 value2 _merge 0 A 1.0 NaN left_only 1 B 2.0 4.0 both 2 C 3.0 5.0 both 3 D NaN 6.0 right_only
⚠️

Common Pitfalls

One common mistake is forgetting to set indicator=True, so you don't get the source column. Another is not specifying the how parameter correctly, which affects which rows appear.

Also, if you want a custom name for the indicator column, you must use indicator='your_name'. Otherwise, it defaults to _merge.

python
import pandas as pd

df1 = pd.DataFrame({'key': ['A', 'B'], 'val1': [1, 2]})
df2 = pd.DataFrame({'key': ['B', 'C'], 'val2': [3, 4]})

# Wrong: no indicator
merged_wrong = pd.merge(df1, df2, on='key', how='outer')
print('Without indicator:')
print(merged_wrong)

# Right: with indicator
merged_right = pd.merge(df1, df2, on='key', how='outer', indicator=True)
print('\nWith indicator:')
print(merged_right)
Output
Without indicator: key val1 val2 0 A 1.0 NaN 1 B 2.0 3.0 2 C NaN 4.0 With indicator: key val1 val2 _merge 0 A 1.0 NaN left_only 1 B 2.0 3.0 both 2 C NaN 4.0 right_only
📊

Quick Reference

ParameterDescription
leftFirst DataFrame to merge
rightSecond DataFrame to merge
how'left', 'right', 'inner', or 'outer' join type
onColumn(s) to join on
indicatorSet to True to add source column
indicator_nameCustom name for the indicator column (default '_merge')

Key Takeaways

Use indicator=True in pd.merge() to add a column showing the source of each row.
The indicator column values are 'left_only', 'right_only', or 'both' depending on row origin.
Specify how='outer' to see all rows from both DataFrames with their source.
You can rename the indicator column with indicator parameter.
Without indicator=True, you won't know which DataFrame each row came from after merging.