0
0
Pandasdata~20 mins

Keeping first vs last vs none in Pandas - Practice Questions

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Duplicate Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of drop_duplicates with keep='first'
What is the output DataFrame after running this code?
Pandas
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 3, 3],
    'B': ['x', 'y', 'y', 'z', 'z', 'z']
})

result = df.drop_duplicates(subset=['A'], keep='first')
print(result)
A
   A  B
1  2  y
3  3  z
B
   A  B
2  2  y
4  3  z
C
   A  B
0  1  x
2  2  y
5  3  z
D
   A  B
0  1  x
1  2  y
3  3  z
Attempts:
2 left
💡 Hint
Keep='first' keeps the first occurrence of each duplicate group.
Predict Output
intermediate
2:00remaining
Output of drop_duplicates with keep='last'
What is the output DataFrame after running this code?
Pandas
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 3, 3],
    'B': ['x', 'y', 'y', 'z', 'z', 'z']
})

result = df.drop_duplicates(subset=['A'], keep='last')
print(result)
A
   A  B
0  1  x
1  2  y
3  3  z
B
   A  B
1  2  y
4  3  z
C
   A  B
0  1  x
2  2  y
5  3  z
D
   A  B
2  2  y
4  3  z
Attempts:
2 left
💡 Hint
Keep='last' keeps the last occurrence of each duplicate group.
Predict Output
advanced
2:00remaining
Output of drop_duplicates with keep=False
What is the output DataFrame after running this code?
Pandas
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 3, 3],
    'B': ['x', 'y', 'y', 'z', 'z', 'z']
})

result = df.drop_duplicates(subset=['A'], keep=False)
print(result)
A
   A  B
1  2  y
3  3  z
B
   A  B
0  1  x
CEmpty DataFrame\nColumns: [A, B]\nIndex: []
D
   A  B
0  1  x
1  2  y
3  3  z
Attempts:
2 left
💡 Hint
Keep=False drops all duplicates, keeping only unique rows.
🔧 Debug
advanced
2:00remaining
Identify the error in drop_duplicates usage
What error does this code raise when executed?
Pandas
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2],
    'B': ['x', 'y', 'y']
})

result = df.drop_duplicates(subset=['A'], keep='middle')
print(result)
AValueError: keep must be either 'first', 'last', or False
BTypeError: unhashable type: 'list'
CKeyError: 'middle'
DNo error, outputs DataFrame with all rows
Attempts:
2 left
💡 Hint
Check the allowed values for the keep parameter.
🚀 Application
expert
3:00remaining
Count unique rows after drop_duplicates with different keep values
Given this DataFrame, how many rows remain after applying drop_duplicates on column 'A' with keep='first', keep='last', and keep=False respectively?
Pandas
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 3, 3, 4],
    'B': ['x', 'y', 'y', 'z', 'z', 'z', 'w']
})

count_first = len(df.drop_duplicates(subset=['A'], keep='first'))
count_last = len(df.drop_duplicates(subset=['A'], keep='last'))
count_none = len(df.drop_duplicates(subset=['A'], keep=False))

print(count_first, count_last, count_none)
A4 4 2
B4 4 1
C5 5 3
D5 5 2
Attempts:
2 left
💡 Hint
Count unique values and consider which rows are kept or removed.