What is Counting duplicates in Pandas?

Pandasdata~5 mins

Counting duplicates in Pandas

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

Counting duplicates helps you find repeated data in your table. This is useful to check data quality or understand patterns.

You want to see if a list of customer emails has repeats.

You need to find duplicate product IDs in your sales data.

You want to count how many times each word appears in a text dataset.

You want to check if survey responses have repeated answers.

Syntax

Pandas

DataFrame.duplicated(subset=None, keep='first')

DataFrame.drop_duplicates(subset=None, keep='first')

Series.value_counts()

duplicated() marks rows that are duplicates.

value_counts() counts how many times each unique value appears.

Examples

Returns a boolean series marking duplicate rows as True except for the first occurrence.

Pandas

df.duplicated()

Counts how many times each unique value appears in a column.

Pandas

df['column'].value_counts()

Marks all duplicates in specified columns as True, including the first occurrences.

Pandas

df.duplicated(subset=['col1', 'col2'], keep=False)

Sample Program

This code creates a small table with names and ages. It marks duplicate rows and counts how many times each name appears.

Pandas

import pandas as pd

data = {'Name': ['Anna', 'Bob', 'Anna', 'Mike', 'Bob', 'Anna'],
        'Age': [25, 30, 25, 22, 30, 25]}
df = pd.DataFrame(data)

# Find duplicate rows
duplicates = df.duplicated()
print('Duplicate rows marked as True:')
print(duplicates)

# Count how many times each name appears
name_counts = df['Name'].value_counts()
print('\nCount of each name:')
print(name_counts)

OutputSuccess

Important Notes

Use keep=False in duplicated() to mark all duplicates as True, not just later ones.

value_counts() works on Series (one column) and on DataFrames to count unique row combinations (pandas 1.1+).

Summary

Counting duplicates helps find repeated data in tables.

duplicated() marks duplicate rows as True or False.

value_counts() counts how often each value appears in a column.