0
0
Data Analysis Pythondata~20 mins

Removing duplicates (drop_duplicates) in Data Analysis Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Duplicate Remover Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of drop_duplicates with subset
What is the output DataFrame after running this code?
Data Analysis Python
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 1],
    'B': ['x', 'y', 'y', 'z', 'x'],
    'C': [10, 20, 20, 30, 40]
})

result = df.drop_duplicates(subset=['A'])
print(result)
A
   A  B   C
0  1  x  10
1  2  y  20
2  2  y  20
3  3  z  30
4  1  x  40
B
   A  B   C
0  1  x  10
2  2  y  20
3  3  z  30
C
   A  B   C
1  2  y  20
2  2  y  20
3  3  z  30
D
   A  B   C
0  1  x  10
1  2  y  20
3  3  z  30
Attempts:
2 left
💡 Hint
Look at the first occurrence of each unique value in column 'A'.
data_output
intermediate
1:30remaining
Number of rows after drop_duplicates
How many rows remain after removing duplicates based on columns 'A' and 'B'?
Data Analysis Python
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 1],
    'B': ['x', 'y', 'y', 'z', 'x'],
    'C': [10, 20, 20, 30, 40]
})

result = df.drop_duplicates(subset=['A', 'B'])
print(len(result))
A3
B4
C5
D2
Attempts:
2 left
💡 Hint
Check unique pairs of (A, B) in the DataFrame.
🔧 Debug
advanced
1:30remaining
Error when using drop_duplicates with inplace=True
What error will this code raise?
Data Analysis Python
import pandas as pd

df = pd.DataFrame({'A': [1, 1, 2], 'B': [3, 3, 4]})
df.drop_duplicates(inplace=True, subset=['A', 'B'])
print(df)
ANameError: name 'df' is not defined
BNo error, prints the DataFrame with duplicates removed
CAttributeError: 'NoneType' object has no attribute 'print'
DTypeError: drop_duplicates() got an unexpected keyword argument 'inplace'
Attempts:
2 left
💡 Hint
Check what drop_duplicates returns when inplace=True.
🚀 Application
advanced
2:00remaining
Removing duplicates but keeping last occurrence
Which option correctly removes duplicates from DataFrame df based on column 'A' but keeps the last occurrence?
Data Analysis Python
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 1],
    'B': ['x', 'y', 'y', 'z', 'x'],
    'C': [10, 20, 20, 30, 40]
})
Adf.drop_duplicates(subset=['A'], keep='none')
Bdf.drop_duplicates(subset=['A'], keep='first')
Cdf.drop_duplicates(subset=['A'], keep='last')
Ddf.drop_duplicates(subset=['A'], keep=False)
Attempts:
2 left
💡 Hint
The keep parameter controls which duplicate to keep.
🧠 Conceptual
expert
1:30remaining
Effect of drop_duplicates on index
After using drop_duplicates on a DataFrame, what happens to the index by default?
AThe original index values are preserved, including gaps from removed rows
BThe index is reset to a new continuous range starting from 0
CThe index is dropped and replaced with a default integer index without gaps
DThe index is converted to a MultiIndex based on duplicate columns
Attempts:
2 left
💡 Hint
Think about whether drop_duplicates changes the index or just drops rows.