Pandasdata~10 mins

Keeping first vs last vs none in Pandas - Visual Side-by-Side Comparison

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Keeping first vs last vs none

Start with DataFrame

↓

Identify duplicates

↓

Choose keep option

↓

Keep first duplicate

↓

Return cleaned DataFrame

This flow shows how pandas identifies duplicates and then keeps either the first, last, or no duplicates based on the chosen option.

Execution Sample

Pandas

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 2, 3, 3, 3], 'B': ['x', 'y', 'y', 'z', 'z', 'z']})
df_first = df.drop_duplicates(keep='first')
df_last = df.drop_duplicates(keep='last')
df_none = df.drop_duplicates(keep=False)

This code creates a DataFrame with duplicates and shows how to keep first, last, or no duplicates.

Execution Table

Step	DataFrame State	Action	Resulting DataFrame Rows
1	[1,x], [2,y], [2,y], [3,z], [3,z], [3,z]	Identify duplicates	Duplicates found at rows 2,3 (2,y) and rows 4,5,6 (3,z)
2	[1,x], [2,y], [2,y], [3,z], [3,z], [3,z]	drop_duplicates(keep='first')	[1,x], [2,y], [3,z]
3	[1,x], [2,y], [2,y], [3,z], [3,z], [3,z]	drop_duplicates(keep='last')	[1,x], [2,y], [3,z]
4	[1,x], [2,y], [2,y], [3,z], [3,z], [3,z]	drop_duplicates(keep=False)	[1,x]
5		End of operations	No more duplicates to process

💡 All duplicates handled according to keep option; process ends.

Variable Tracker

Variable	Start	After Step 2	After Step 3	After Step 4	Final
df	[1,x], [2,y], [2,y], [3,z], [3,z], [3,z]	[1,x], [2,y], [2,y], [3,z], [3,z], [3,z]	[1,x], [2,y], [2,y], [3,z], [3,z], [3,z]	[1,x], [2,y], [2,y], [3,z], [3,z], [3,z]	[1,x], [2,y], [2,y], [3,z], [3,z], [3,z]
df_first	N/A	[1,x], [2,y], [3,z]	[1,x], [2,y], [3,z]	[1,x], [2,y], [3,z]	[1,x], [2,y], [3,z]
df_last	N/A	N/A	[1,x], [2,y], [3,z]	[1,x], [2,y], [3,z]	[1,x], [2,y], [3,z]
df_none	N/A	N/A	N/A	[1,x]	[1,x]

Key Moments - 3 Insights

Why does drop_duplicates(keep='first') keep the first occurrence and remove later ones?

What happens when keep=False is used?

Why do drop_duplicates(keep='first') and drop_duplicates(keep='last') sometimes return the same rows?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table at step 4. What rows remain after drop_duplicates(keep=False)?

A[2,y], [3,z]

B[1,x]

C[1,x], [2,y], [3,z]

DAll rows remain

Concept Snapshot

pandas drop_duplicates method removes duplicate rows.
keep='first' keeps the first occurrence.
keep='last' keeps the last occurrence.
keep=False removes all duplicates.
Useful to clean data by controlling which duplicates to keep.

Full Transcript

This lesson shows how pandas drop_duplicates works with keep options: first, last, and none. We start with a DataFrame containing duplicate rows. The method identifies duplicates and then removes them based on the keep parameter. If keep='first', it keeps the first occurrence and removes later duplicates. If keep='last', it keeps the last occurrence and removes earlier duplicates. If keep=False, it removes all rows that have duplicates, leaving only unique rows. The execution table traces these steps with the DataFrame states and results. Variable tracking shows how the DataFrames change after each operation. Key moments clarify common confusions about how duplicates are handled. The visual quiz tests understanding by asking about the resulting rows after each operation. This helps beginners see exactly how pandas manages duplicates step-by-step.