0
0
Pandasdata~20 mins

duplicated() for finding duplicates in Pandas - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Duplicate Detection Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of duplicated() with default parameters
What is the output of the following code snippet?
Pandas
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 2, 3, 4, 4, 4], 'B': ['x', 'y', 'y', 'z', 'x', 'x', 'x']})
result = df.duplicated()
print(result.tolist())
A[True, False, True, False, True, False, True]
B[False, True, True, False, True, True, True]
C[False, False, False, False, False, False, False]
D[False, False, True, False, False, True, True]
Attempts:
2 left
💡 Hint
duplicated() marks all rows that have appeared before as True, except the first occurrence.
data_output
intermediate
2:00remaining
Count of duplicated rows with subset and keep='last'
Given the DataFrame below, how many rows are marked as duplicates when using subset=['A'] and keep='last'?
Pandas
import pandas as pd

df = pd.DataFrame({'A': [5, 6, 5, 7, 6, 8, 5]})
duplicates = df.duplicated(subset=['A'], keep='last')
count = duplicates.sum()
print(count)
A3
B2
C1
D4
Attempts:
2 left
💡 Hint
keep='last' marks all duplicates except the last occurrence as True.
🔧 Debug
advanced
2:00remaining
Identify the error in duplicated() usage
What error will this code raise?
Pandas
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3]})
result = df.duplicated(keep='middle')
print(result)
AValueError: keep must be either 'first', 'last', or False
BTypeError: duplicated() got an unexpected keyword argument 'keep'
CSyntaxError: invalid syntax
DNo error, prints a boolean Series
Attempts:
2 left
💡 Hint
Check the allowed values for the keep parameter in duplicated().
🚀 Application
advanced
2:00remaining
Filter unique rows using duplicated()
Which code snippet correctly filters the DataFrame to keep only unique rows (no duplicates) based on all columns?
Pandas
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 2, 3], 'B': ['a', 'b', 'b', 'c']})
Adf[df.duplicated(keep=False)]
Bdf[df.duplicated(keep='first')]
Cdf[~df.duplicated(keep=False)]
Ddf[~df.duplicated(keep='last')]
Attempts:
2 left
💡 Hint
Use keep=False to mark all duplicates as True, then invert to get unique rows.
🧠 Conceptual
expert
3:00remaining
Understanding duplicated() behavior with subset and keep=False
Consider a DataFrame with columns 'X' and 'Y'. If you run df.duplicated(subset=['X'], keep=False), what does the output represent?
AMarks only the first occurrence of each duplicate value in column 'X' as True.
BMarks all rows that have duplicate values in column 'X', including the first and last occurrences.
CMarks only the last occurrence of each duplicate value in column 'X' as True.
DMarks no rows as duplicates because keep=False disables duplicate detection.
Attempts:
2 left
💡 Hint
keep=False means no row is considered unique among duplicates.