We use duplicated() to find repeated rows in data. It helps spot copies or repeated information easily.
duplicated() for finding duplicates in Pandas
DataFrame.duplicated(subset=None, keep='first')
subset lets you choose columns to check for duplicates. If None, all columns are checked.
keep decides which duplicates to mark as False: 'first' keeps first occurrence, 'last' keeps last, False marks all duplicates True.
df.duplicated()
df.duplicated(subset=['Name', 'Age'])
df.duplicated(keep='last')df.duplicated(keep=False)This code creates a small table of people with their age and city. It then finds duplicates in two ways: first by all columns, second by just name and age, marking all duplicates.
import pandas as pd data = {'Name': ['Anna', 'Bob', 'Anna', 'Mike', 'Bob'], 'Age': [25, 30, 25, 40, 30], 'City': ['NY', 'LA', 'NY', 'Chicago', 'LA']} df = pd.DataFrame(data) # Find duplicates considering all columns duplicates_all = df.duplicated() # Find duplicates based on Name and Age only duplicates_name_age = df.duplicated(subset=['Name', 'Age'], keep=False) print('Duplicates (all columns):') print(duplicates_all) print('\nDuplicates (Name and Age, all duplicates marked):') print(duplicates_name_age)
Use duplicated() before removing duplicates to understand your data better.
Remember that duplicated() returns a boolean Series, not the duplicate rows themselves.
Combine with drop_duplicates() to remove duplicates after finding them.
duplicated() helps find repeated rows in data.
You can check duplicates by all columns or specific columns.
It returns True for duplicates and False for unique rows, depending on the keep setting.