Pandasdata~10 mins

Counting duplicates in Pandas - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Counting duplicates

Start with DataFrame

↓

Identify duplicate rows

↓

Count duplicates per row or overall

↓

Output counts or filtered DataFrame

↓

End

We start with a DataFrame, find which rows are duplicates, count them, and then output the counts or filtered data.

Execution Sample

Pandas

import pandas as pd

df = pd.DataFrame({'A':[1,2,2,3,3,3], 'B':[5,6,6,7,7,7]})
dup_counts = df.duplicated(keep=False).sum()

This code creates a DataFrame and counts how many rows are duplicates.

Execution Table

Step	Action	DataFrame State	Duplicates Identified	Count Result
1	Create DataFrame	[{'A':1,'B':5},{'A':2,'B':6},{'A':2,'B':6},{'A':3,'B':7},{'A':3,'B':7},{'A':3,'B':7}]	None yet	None yet
2	Check duplicates with keep=False	Same as step 1	[False, True, True, True, True, True]	None yet
3	Sum True values for duplicates	Same as step 1	[False, True, True, True, True, True]	5
4	Output total duplicate count	Same as step 1	[False, True, True, True, True, True]	5

💡 All rows checked; total duplicates counted as 5

Variable Tracker

Variable	Start	After Step 2	After Step 3	Final
df	Empty	[{'A':1,'B':5},{'A':2,'B':6},{'A':2,'B':6},{'A':3,'B':7},{'A':3,'B':7},{'A':3,'B':7}]	Same	Same
dup_mask	None	[False, True, True, True, True, True]	Same	Same
dup_counts	None	None	5	5

Key Moments - 3 Insights

Why does duplicated(keep=False) mark all duplicates as True, not just some?

Why is the first row marked False even if there are duplicates?

How does sum() count duplicates from the boolean mask?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table at step 2, what is the duplicate mask for the third row?

AFalse

BTrue

CNone

DError

Concept Snapshot

Counting duplicates in pandas:
- Use df.duplicated(keep=False) to mark all duplicates True
- Sum the boolean mask to count duplicates
- keep='first' or 'last' marks only some duplicates
- Useful to find repeated rows in data
- Returns boolean Series for filtering or counting

Full Transcript

We start with a DataFrame containing some repeated rows. Using pandas duplicated() with keep=False marks all duplicates as True in a boolean mask. Summing this mask counts how many rows are duplicates. This helps identify repeated data. The mask shows True for duplicates and False for unique rows. Changing keep parameter changes which duplicates are marked. This step-by-step trace shows how pandas counts duplicates clearly.