0
0
Pandasdata~5 mins

Counting duplicates in Pandas

Choose your learning style9 modes available
Introduction

Counting duplicates helps you find repeated data in your table. This is useful to check data quality or understand patterns.

You want to see if a list of customer emails has repeats.
You need to find duplicate product IDs in your sales data.
You want to count how many times each word appears in a text dataset.
You want to check if survey responses have repeated answers.
Syntax
Pandas
DataFrame.duplicated(subset=None, keep='first')

DataFrame.drop_duplicates(subset=None, keep='first')

Series.value_counts()

duplicated() marks rows that are duplicates.

value_counts() counts how many times each unique value appears.

Examples
Returns a boolean series marking duplicate rows as True except for the first occurrence.
Pandas
df.duplicated()
Counts how many times each unique value appears in a column.
Pandas
df['column'].value_counts()
Marks all duplicates in specified columns as True, including the first occurrences.
Pandas
df.duplicated(subset=['col1', 'col2'], keep=False)
Sample Program

This code creates a small table with names and ages. It marks duplicate rows and counts how many times each name appears.

Pandas
import pandas as pd

data = {'Name': ['Anna', 'Bob', 'Anna', 'Mike', 'Bob', 'Anna'],
        'Age': [25, 30, 25, 22, 30, 25]}
df = pd.DataFrame(data)

# Find duplicate rows
duplicates = df.duplicated()
print('Duplicate rows marked as True:')
print(duplicates)

# Count how many times each name appears
name_counts = df['Name'].value_counts()
print('\nCount of each name:')
print(name_counts)
OutputSuccess
Important Notes

Use keep=False in duplicated() to mark all duplicates as True, not just later ones.

value_counts() works on Series (one column) and on DataFrames to count unique row combinations (pandas 1.1+).

Summary

Counting duplicates helps find repeated data in tables.

duplicated() marks duplicate rows as True or False.

value_counts() counts how often each value appears in a column.