0
0
Pandasdata~30 mins

Counting duplicates in Pandas - Mini Project: Build & Apply

Choose your learning style9 modes available
Counting duplicates
📖 Scenario: You work in a small bookstore. You have a list of books sold today, but some books appear more than once because multiple copies were sold. You want to find out how many times each book was sold.
🎯 Goal: Build a small program using pandas to count how many times each book title appears in the sales list.
📋 What You'll Learn
Create a pandas DataFrame with a column named Book containing the exact book titles given.
Create a variable called count_duplicates that counts how many times each book title appears.
Print the count_duplicates variable to see the counts.
💡 Why This Matters
🌍 Real World
Counting duplicates helps businesses understand which products sell more and manage inventory better.
💼 Career
Data analysts often count duplicates to summarize data and find popular items or repeated entries.
Progress0 / 4 steps
1
Create the sales data
Import pandas as pd and create a DataFrame called sales with one column named Book. The column should have these exact values in this order: 'Harry Potter', 'The Hobbit', 'Harry Potter', '1984', 'The Hobbit', 'The Hobbit'.
Pandas
Need a hint?

Use pd.DataFrame with a dictionary where the key is 'Book' and the value is the list of book titles.

2
Set up counting duplicates
Create a variable called count_duplicates that counts how many times each book title appears in the sales DataFrame using the value_counts() method on the Book column.
Pandas
Need a hint?

Use sales['Book'].value_counts() to count duplicates.

3
Print the counts
Print the variable count_duplicates to display how many times each book title appears.
Pandas
Need a hint?

Use print(count_duplicates) to show the counts.

4
Filter books sold more than once
Create a variable called duplicates_more_than_one that filters count_duplicates to keep only books sold more than once (count > 1). Then print duplicates_more_than_one.
Pandas
Need a hint?

Use boolean indexing like count_duplicates[count_duplicates > 1] to filter.