Challenge - 5 Problems

🎖️

Duplicate Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of drop_duplicates with keep='first'

What is the output DataFrame after running this code?

Pandas

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 3, 3],
    'B': ['x', 'y', 'y', 'z', 'z', 'z']
})

result = df.drop_duplicates(subset=['A'], keep='first')
print(result)

   A  B
1  2  y
3  3  z

   A  B
2  2  y
4  3  z

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of drop_duplicates with keep='last'

What is the output DataFrame after running this code?

Pandas

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 3, 3],
    'B': ['x', 'y', 'y', 'z', 'z', 'z']
})

result = df.drop_duplicates(subset=['A'], keep='last')
print(result)

   A  B
1  2  y
4  3  z

   A  B
2  2  y
4  3  z

Attempts:

2 left

❓ Predict Output

advanced

2:00remaining

Output of drop_duplicates with keep=False

What is the output DataFrame after running this code?

Pandas

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 3, 3],
    'B': ['x', 'y', 'y', 'z', 'z', 'z']
})

result = df.drop_duplicates(subset=['A'], keep=False)
print(result)

   A  B
1  2  y
3  3  z

   A  B
0  1  x

CEmpty DataFrame\nColumns: [A, B]\nIndex: []

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in drop_duplicates usage

What error does this code raise when executed?

Pandas

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2],
    'B': ['x', 'y', 'y']
})

result = df.drop_duplicates(subset=['A'], keep='middle')
print(result)

AValueError: keep must be either 'first', 'last', or False

BTypeError: unhashable type: 'list'

CKeyError: 'middle'

DNo error, outputs DataFrame with all rows

Attempts:

2 left

🚀 Application

expert

3:00remaining

Count unique rows after drop_duplicates with different keep values

Given this DataFrame, how many rows remain after applying drop_duplicates on column 'A' with keep='first', keep='last', and keep=False respectively?

Pandas

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 3, 3, 4],
    'B': ['x', 'y', 'y', 'z', 'z', 'z', 'w']
})

count_first = len(df.drop_duplicates(subset=['A'], keep='first'))
count_last = len(df.drop_duplicates(subset=['A'], keep='last'))
count_none = len(df.drop_duplicates(subset=['A'], keep=False))

print(count_first, count_last, count_none)

A4 4 2

B4 4 1

C5 5 3

D5 5 2

Attempts:

2 left