0
0
Pandasdata~15 mins

duplicated() for finding duplicates in Pandas - Mini Project: Build & Apply

Choose your learning style9 modes available
Using duplicated() to Find Duplicate Rows in Data
📖 Scenario: You work in a store's data team. You have a list of sales records. Sometimes, the same sale is recorded twice by mistake. You want to find these duplicate sales to fix the data.
🎯 Goal: You will create a small sales data table, then use pandas duplicated() to find which rows are duplicates.
📋 What You'll Learn
Create a pandas DataFrame called sales with given sales data
Create a variable keep_option to decide which duplicates to mark
Use duplicated() on sales with keep=keep_option to find duplicates
Print the boolean Series showing duplicate rows
💡 Why This Matters
🌍 Real World
Duplicate data can cause errors in reports and decisions. Finding duplicates helps keep data clean and trustworthy.
💼 Career
Data analysts and scientists often clean data by identifying and handling duplicates to ensure accurate analysis.
Progress0 / 4 steps
1
Create the sales DataFrame
Import pandas as pd and create a DataFrame called sales with these exact rows: {'Product': ['Apple', 'Banana', 'Apple', 'Banana', 'Apple'], 'Price': [1.0, 0.5, 1.0, 0.5, 1.0], 'Quantity': [10, 5, 10, 5, 10]}
Pandas
Need a hint?

Use pd.DataFrame() with a dictionary of lists for columns.

2
Set the keep option for duplicates
Create a variable called keep_option and set it to the string 'first' to mark duplicates except the first occurrence.
Pandas
Need a hint?

The keep parameter in duplicated() can be 'first', 'last', or False.

3
Find duplicate rows using duplicated()
Create a variable called duplicates that stores the result of sales.duplicated(keep=keep_option) to find duplicate rows.
Pandas
Need a hint?

Call duplicated() on the DataFrame with the keep argument.

4
Print the duplicates boolean Series
Write a print statement to display the duplicates variable.
Pandas
Need a hint?

Use print(duplicates) to see which rows are duplicates.