Challenge - 5 Problems
Duplicate Remover Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of drop_duplicates with subset
What is the output DataFrame after running this code?
Data Analysis Python
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 2, 3, 1], 'B': ['x', 'y', 'y', 'z', 'x'], 'C': [10, 20, 20, 30, 40] }) result = df.drop_duplicates(subset=['A']) print(result)
Attempts:
2 left
💡 Hint
Look at the first occurrence of each unique value in column 'A'.
✗ Incorrect
drop_duplicates with subset=['A'] keeps the first row for each unique value in column 'A'. Rows with duplicate 'A' values after the first are removed.
❓ data_output
intermediate1:30remaining
Number of rows after drop_duplicates
How many rows remain after removing duplicates based on columns 'A' and 'B'?
Data Analysis Python
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 2, 3, 1], 'B': ['x', 'y', 'y', 'z', 'x'], 'C': [10, 20, 20, 30, 40] }) result = df.drop_duplicates(subset=['A', 'B']) print(len(result))
Attempts:
2 left
💡 Hint
Check unique pairs of (A, B) in the DataFrame.
✗ Incorrect
The unique pairs are (1,x), (2,y), (3,z). The pair (1,x) appears twice but only first is kept. So total unique pairs are 3, but since duplicates are removed, rows with indices 0,1,3 remain. The row with index 4 is duplicate of (1,x) so removed. So total rows after drop_duplicates is 3.
🔧 Debug
advanced1:30remaining
Error when using drop_duplicates with inplace=True
What error will this code raise?
Data Analysis Python
import pandas as pd df = pd.DataFrame({'A': [1, 1, 2], 'B': [3, 3, 4]}) df.drop_duplicates(inplace=True, subset=['A', 'B']) print(df)
Attempts:
2 left
💡 Hint
Check what drop_duplicates returns when inplace=True.
✗ Incorrect
drop_duplicates with inplace=True modifies the DataFrame in place and returns None. The variable df remains defined and prints correctly.
🚀 Application
advanced2:00remaining
Removing duplicates but keeping last occurrence
Which option correctly removes duplicates from DataFrame df based on column 'A' but keeps the last occurrence?
Data Analysis Python
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 2, 3, 1], 'B': ['x', 'y', 'y', 'z', 'x'], 'C': [10, 20, 20, 30, 40] })
Attempts:
2 left
💡 Hint
The keep parameter controls which duplicate to keep.
✗ Incorrect
keep='last' keeps the last occurrence of each duplicate, removing earlier ones.
🧠 Conceptual
expert1:30remaining
Effect of drop_duplicates on index
After using drop_duplicates on a DataFrame, what happens to the index by default?
Attempts:
2 left
💡 Hint
Think about whether drop_duplicates changes the index or just drops rows.
✗ Incorrect
drop_duplicates removes duplicate rows but keeps the original index values, so gaps remain unless reset_index() is called.