0
0
Pandasdata~10 mins

Keeping first vs last vs none in Pandas - Visual Side-by-Side Comparison

Choose your learning style9 modes available
Concept Flow - Keeping first vs last vs none
Start with DataFrame
Identify duplicates
Choose keep option
Keep first duplicate
Return cleaned DataFrame
This flow shows how pandas identifies duplicates and then keeps either the first, last, or no duplicates based on the chosen option.
Execution Sample
Pandas
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 2, 3, 3, 3], 'B': ['x', 'y', 'y', 'z', 'z', 'z']})
df_first = df.drop_duplicates(keep='first')
df_last = df.drop_duplicates(keep='last')
df_none = df.drop_duplicates(keep=False)
This code creates a DataFrame with duplicates and shows how to keep first, last, or no duplicates.
Execution Table
StepDataFrame StateActionResulting DataFrame Rows
1[1,x], [2,y], [2,y], [3,z], [3,z], [3,z]Identify duplicatesDuplicates found at rows 2,3 (2,y) and rows 4,5,6 (3,z)
2[1,x], [2,y], [2,y], [3,z], [3,z], [3,z]drop_duplicates(keep='first')[1,x], [2,y], [3,z]
3[1,x], [2,y], [2,y], [3,z], [3,z], [3,z]drop_duplicates(keep='last')[1,x], [2,y], [3,z]
4[1,x], [2,y], [2,y], [3,z], [3,z], [3,z]drop_duplicates(keep=False)[1,x]
5End of operationsNo more duplicates to process
💡 All duplicates handled according to keep option; process ends.
Variable Tracker
VariableStartAfter Step 2After Step 3After Step 4Final
df[1,x], [2,y], [2,y], [3,z], [3,z], [3,z][1,x], [2,y], [2,y], [3,z], [3,z], [3,z][1,x], [2,y], [2,y], [3,z], [3,z], [3,z][1,x], [2,y], [2,y], [3,z], [3,z], [3,z][1,x], [2,y], [2,y], [3,z], [3,z], [3,z]
df_firstN/A[1,x], [2,y], [3,z][1,x], [2,y], [3,z][1,x], [2,y], [3,z][1,x], [2,y], [3,z]
df_lastN/AN/A[1,x], [2,y], [3,z][1,x], [2,y], [3,z][1,x], [2,y], [3,z]
df_noneN/AN/AN/A[1,x][1,x]
Key Moments - 3 Insights
Why does drop_duplicates(keep='first') keep the first occurrence and remove later ones?
Because pandas scans rows top to bottom and marks duplicates after the first occurrence for removal, as shown in execution_table step 2.
What happens when keep=False is used?
All rows that have duplicates anywhere are removed, leaving only unique rows, as shown in execution_table step 4 where only [1,x] remains.
Why do drop_duplicates(keep='first') and drop_duplicates(keep='last') sometimes return the same rows?
If duplicates are identical, keeping first or last results in one row per duplicate group, so the output rows look the same, as seen in steps 2 and 3.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 4. What rows remain after drop_duplicates(keep=False)?
A[2,y], [3,z]
B[1,x]
C[1,x], [2,y], [3,z]
DAll rows remain
💡 Hint
Check the Resulting DataFrame Rows column at step 4 in execution_table.
At which step does drop_duplicates(keep='last') produce its result?
AStep 3
BStep 4
CStep 2
DStep 5
💡 Hint
Look at the Action column in execution_table for drop_duplicates(keep='last').
If the DataFrame had no duplicates, what would drop_duplicates(keep='first') return?
AAn empty DataFrame
BOnly the first row
CThe original DataFrame unchanged
DOnly the last row
💡 Hint
Consider what happens when no duplicates are found in the initial DataFrame state.
Concept Snapshot
pandas drop_duplicates method removes duplicate rows.
keep='first' keeps the first occurrence.
keep='last' keeps the last occurrence.
keep=False removes all duplicates.
Useful to clean data by controlling which duplicates to keep.
Full Transcript
This lesson shows how pandas drop_duplicates works with keep options: first, last, and none. We start with a DataFrame containing duplicate rows. The method identifies duplicates and then removes them based on the keep parameter. If keep='first', it keeps the first occurrence and removes later duplicates. If keep='last', it keeps the last occurrence and removes earlier duplicates. If keep=False, it removes all rows that have duplicates, leaving only unique rows. The execution table traces these steps with the DataFrame states and results. Variable tracking shows how the DataFrames change after each operation. Key moments clarify common confusions about how duplicates are handled. The visual quiz tests understanding by asking about the resulting rows after each operation. This helps beginners see exactly how pandas manages duplicates step-by-step.