Challenge - 5 Problems
Duplicate Detective
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Counting duplicate rows in a DataFrame
What is the output of this code that counts duplicate rows in a pandas DataFrame?
Pandas
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 2, 3, 3, 3], 'B': ['x', 'y', 'y', 'z', 'z', 'z'] }) count_duplicates = df.duplicated().sum() print(count_duplicates)
Attempts:
2 left
💡 Hint
Remember that duplicated() marks all rows except the first occurrence as duplicates.
✗ Incorrect
The DataFrame has 6 rows. Rows 2, 4, and 5 are duplicates of previous rows, so duplicated() returns True for these 3 rows. Summing True values gives 3.
❓ data_output
intermediate2:00remaining
Counting duplicates with subset columns
Given this DataFrame, what is the output of counting duplicates only based on column 'A'?
Pandas
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 2, 3, 3, 3], 'B': ['x', 'y', 'z', 'z', 'y', 'z'] }) count_dup_subset = df.duplicated(subset=['A']).sum() print(count_dup_subset)
Attempts:
2 left
💡 Hint
Duplicates are counted only by column 'A', ignoring column 'B'.
✗ Incorrect
Column 'A' has values [1, 2, 2, 3, 3, 3]. The first occurrences are at indices 0,1,3. The duplicates are at indices 2,4,5. So total duplicates are 3. But duplicated() marks only the second and later duplicates, so indices 2,4,5 are True. Summing gives 3.
🔧 Debug
advanced2:00remaining
Identify the error in counting duplicates
What error does this code raise when trying to count duplicates in a DataFrame?
Pandas
import pandas as pd df = pd.DataFrame({'A': [1, 2, 2], 'B': ['x', 'y', 'y']}) count = df.duplicated(subset='A', keep='maybe').sum() print(count)
Attempts:
2 left
💡 Hint
Check the allowed values for the 'keep' parameter in duplicated().
✗ Incorrect
The 'keep' parameter only accepts 'first', 'last', or False. Using 'maybe' causes a ValueError.
🚀 Application
advanced2:00remaining
Find how many unique duplicate rows exist
How many unique rows appear more than once in this DataFrame?
Pandas
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 2, 3, 3, 3, 4], 'B': ['x', 'y', 'y', 'z', 'z', 'z', 'w'] }) # Count unique rows that have duplicates counts = df.value_counts() num_unique_duplicates = (counts > 1).sum() print(num_unique_duplicates)
Attempts:
2 left
💡 Hint
Use value_counts() to count how many times each row appears, then count how many appear more than once.
✗ Incorrect
Rows (2, 'y') and (3, 'z') appear multiple times. So there are 2 unique rows duplicated. But check carefully: (2, 'y') appears twice, (3, 'z') appears three times. So total unique duplicates are 2.
🧠 Conceptual
expert2:00remaining
Understanding duplicated() with keep=False
What is the output of this code that marks all duplicates including the first occurrence?
Pandas
import pandas as pd df = pd.DataFrame({ 'A': [2, 2, 2, 3, 3, 3], 'B': ['y', 'y', 'y', 'z', 'z', 'z'] }) duplicates_all = df.duplicated(keep=False) print(duplicates_all.sum())
Attempts:
2 left
💡 Hint
The keep=False option marks all duplicates as True, including the first occurrence.
✗ Incorrect
With keep=False, all rows that have duplicates anywhere are marked True. Here, rows with values 2 and 3 appear multiple times, so all those rows are True. That is 6 rows total.