0
0
Pandasdata~30 mins

Keeping first vs last vs none in Pandas - Hands-On Comparison

Choose your learning style9 modes available
Keeping First vs Last vs None in pandas
📖 Scenario: You work in a store's data team. You have a list of sales records with some duplicate entries for the same product. You want to clean the data by removing duplicates but keep either the first sale, the last sale, or remove all duplicates completely.
🎯 Goal: Learn how to use pandas drop_duplicates() with keep='first', keep='last', and keep=False options to control which duplicates to keep or remove.
📋 What You'll Learn
Create a pandas DataFrame called sales with given data
Create a variable called subset_cols to specify columns to check duplicates
Use drop_duplicates() with keep='first' to keep first duplicates
Use drop_duplicates() with keep='last' to keep last duplicates
Use drop_duplicates() with keep=False to remove all duplicates
Print the resulting DataFrames
💡 Why This Matters
🌍 Real World
Cleaning duplicate sales records is common in retail data analysis to ensure accurate reporting and inventory management.
💼 Career
Data analysts and data scientists often need to remove or handle duplicates in datasets before analysis or modeling.
Progress0 / 4 steps
1
Create the sales DataFrame
Create a pandas DataFrame called sales with these exact columns and rows:
{'Product': ['Apple', 'Banana', 'Apple', 'Banana', 'Cherry', 'Apple'], 'Price': [100, 80, 100, 90, 120, 100], 'Quantity': [5, 7, 5, 8, 10, 5]}
Pandas
Need a hint?

Use pd.DataFrame() with a dictionary of lists for columns.

2
Create subset_cols variable for duplicate check
Create a variable called subset_cols and set it to a list containing the columns 'Product' and 'Price' to check duplicates based on these columns.
Pandas
Need a hint?

Just assign the list ['Product', 'Price'] to subset_cols.

3
Remove duplicates keeping the first occurrence
Create a new DataFrame called keep_first by using sales.drop_duplicates() with subset=subset_cols and keep='first' to keep the first duplicate rows.
Pandas
Need a hint?

Use drop_duplicates() with subset=subset_cols and keep='first'.

4
Remove duplicates keeping the last occurrence and removing all duplicates
Create two new DataFrames:
1. keep_last by dropping duplicates with subset=subset_cols and keep='last'.
2. keep_none by dropping duplicates with subset=subset_cols and keep=False to remove all duplicates.
Then print keep_first, keep_last, and keep_none.
Pandas
Need a hint?

Use drop_duplicates() with keep='last' and keep=False. Then print all three DataFrames.