0
0
Pandasdata~20 mins

Counting duplicates in Pandas - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Duplicate Detective
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Counting duplicate rows in a DataFrame
What is the output of this code that counts duplicate rows in a pandas DataFrame?
Pandas
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 3, 3],
    'B': ['x', 'y', 'y', 'z', 'z', 'z']
})

count_duplicates = df.duplicated().sum()
print(count_duplicates)
A4
B2
C0
D3
Attempts:
2 left
💡 Hint
Remember that duplicated() marks all rows except the first occurrence as duplicates.
data_output
intermediate
2:00remaining
Counting duplicates with subset columns
Given this DataFrame, what is the output of counting duplicates only based on column 'A'?
Pandas
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 3, 3],
    'B': ['x', 'y', 'z', 'z', 'y', 'z']
})

count_dup_subset = df.duplicated(subset=['A']).sum()
print(count_dup_subset)
A3
B2
C1
D0
Attempts:
2 left
💡 Hint
Duplicates are counted only by column 'A', ignoring column 'B'.
🔧 Debug
advanced
2:00remaining
Identify the error in counting duplicates
What error does this code raise when trying to count duplicates in a DataFrame?
Pandas
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 2], 'B': ['x', 'y', 'y']})

count = df.duplicated(subset='A', keep='maybe').sum()
print(count)
AValueError: keep must be one of {'first', 'last', False}
BNo error, prints 1
CSyntaxError: invalid syntax
DTypeError: duplicated() got an unexpected keyword argument 'keep'
Attempts:
2 left
💡 Hint
Check the allowed values for the 'keep' parameter in duplicated().
🚀 Application
advanced
2:00remaining
Find how many unique duplicate rows exist
How many unique rows appear more than once in this DataFrame?
Pandas
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 3, 3, 4],
    'B': ['x', 'y', 'y', 'z', 'z', 'z', 'w']
})

# Count unique rows that have duplicates
counts = df.value_counts()
num_unique_duplicates = (counts > 1).sum()
print(num_unique_duplicates)
A4
B2
C3
D1
Attempts:
2 left
💡 Hint
Use value_counts() to count how many times each row appears, then count how many appear more than once.
🧠 Conceptual
expert
2:00remaining
Understanding duplicated() with keep=False
What is the output of this code that marks all duplicates including the first occurrence?
Pandas
import pandas as pd

df = pd.DataFrame({
    'A': [2, 2, 2, 3, 3, 3],
    'B': ['y', 'y', 'y', 'z', 'z', 'z']
})

duplicates_all = df.duplicated(keep=False)
print(duplicates_all.sum())
A0
B3
C6
D4
Attempts:
2 left
💡 Hint
The keep=False option marks all duplicates as True, including the first occurrence.