0
0
Pandasdata~5 mins

Keeping first vs last vs none in Pandas

Choose your learning style9 modes available
Introduction

When you have repeated data, you often want to keep only some of it. This helps to clean your data and avoid confusion.

You have a list of customers and want to keep only their first purchase record.
You want to remove duplicate rows but keep the last entry for each group.
You want to remove all duplicates and keep only unique rows with no repeats.
You are cleaning survey data and want to keep only one response per person.
You want to prepare data for analysis by removing repeated entries.
Syntax
Pandas
DataFrame.drop_duplicates(subset=None, keep='first', inplace=False)

subset lets you choose columns to check for duplicates.

keep can be 'first', 'last', or False (none).

Examples
Keeps the first occurrence of each duplicate and removes the rest.
Pandas
df.drop_duplicates(keep='first')
Keeps the last occurrence of each duplicate and removes the others.
Pandas
df.drop_duplicates(keep='last')
Removes all duplicates, keeping only rows that appear once.
Pandas
df.drop_duplicates(keep=False)
Sample Program

This code shows how to keep first, last, or no duplicates in a simple table of names and scores.

Pandas
import pandas as pd

data = {'Name': ['Anna', 'Bob', 'Anna', 'Bob', 'Cathy'],
        'Score': [85, 90, 85, 95, 88]}
df = pd.DataFrame(data)

print('Original DataFrame:')
print(df)

print('\nKeep first duplicates:')
print(df.drop_duplicates(keep='first'))

print('\nKeep last duplicates:')
print(df.drop_duplicates(keep='last'))

print('\nKeep none (remove all duplicates):')
print(df.drop_duplicates(keep=False))
OutputSuccess
Important Notes

Using keep=False removes all rows that have duplicates, leaving only unique rows.

If you want to check duplicates only on some columns, use the subset parameter.

Setting inplace=True changes the original DataFrame without making a copy.

Summary

Keep='first' keeps the first duplicate and removes later ones.

Keep='last' keeps the last duplicate and removes earlier ones.

Keep=False removes all duplicates, keeping only unique rows.