Challenge - 5 Problems
Duplicate Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of drop_duplicates with keep='first'
What is the output DataFrame after running this code?
Pandas
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 2, 3, 3, 3], 'B': ['x', 'y', 'y', 'z', 'z', 'z'] }) result = df.drop_duplicates(subset=['A'], keep='first') print(result)
Attempts:
2 left
💡 Hint
Keep='first' keeps the first occurrence of each duplicate group.
✗ Incorrect
drop_duplicates with keep='first' keeps the first row for each unique value in column 'A'. Rows with A=2 and A=3 appear multiple times, but only the first occurrence is kept.
❓ Predict Output
intermediate2:00remaining
Output of drop_duplicates with keep='last'
What is the output DataFrame after running this code?
Pandas
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 2, 3, 3, 3], 'B': ['x', 'y', 'y', 'z', 'z', 'z'] }) result = df.drop_duplicates(subset=['A'], keep='last') print(result)
Attempts:
2 left
💡 Hint
Keep='last' keeps the last occurrence of each duplicate group.
✗ Incorrect
drop_duplicates with keep='last' keeps the last row for each unique value in column 'A'. For A=2, last occurrence is index 2; for A=3, last occurrence is index 5.
❓ Predict Output
advanced2:00remaining
Output of drop_duplicates with keep=False
What is the output DataFrame after running this code?
Pandas
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 2, 3, 3, 3], 'B': ['x', 'y', 'y', 'z', 'z', 'z'] }) result = df.drop_duplicates(subset=['A'], keep=False) print(result)
Attempts:
2 left
💡 Hint
Keep=False drops all duplicates, keeping only unique rows.
✗ Incorrect
drop_duplicates with keep=False removes all rows that have duplicates in column 'A'. Only A=1 is unique, so only that row remains.
🔧 Debug
advanced2:00remaining
Identify the error in drop_duplicates usage
What error does this code raise when executed?
Pandas
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 2], 'B': ['x', 'y', 'y'] }) result = df.drop_duplicates(subset=['A'], keep='middle') print(result)
Attempts:
2 left
💡 Hint
Check the allowed values for the keep parameter.
✗ Incorrect
The keep parameter only accepts 'first', 'last', or False. Using 'middle' raises a ValueError.
🚀 Application
expert3:00remaining
Count unique rows after drop_duplicates with different keep values
Given this DataFrame, how many rows remain after applying drop_duplicates on column 'A' with keep='first', keep='last', and keep=False respectively?
Pandas
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 2, 3, 3, 3, 4], 'B': ['x', 'y', 'y', 'z', 'z', 'z', 'w'] }) count_first = len(df.drop_duplicates(subset=['A'], keep='first')) count_last = len(df.drop_duplicates(subset=['A'], keep='last')) count_none = len(df.drop_duplicates(subset=['A'], keep=False)) print(count_first, count_last, count_none)
Attempts:
2 left
💡 Hint
Count unique values and consider which rows are kept or removed.
✗ Incorrect
There are 4 unique values in 'A': 1, 2, 3, 4. keep='first' and keep='last' keep one row per unique value, so 4 rows remain. keep=False removes all duplicates, so only unique values that appear once remain: 1 and 4, so 2 rows remain.