Challenge - 5 Problems

🎖️

Duplicate Remover Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of drop_duplicates with subset

What is the output DataFrame after running this code?

Data Analysis Python

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 1],
    'B': ['x', 'y', 'y', 'z', 'x'],
    'C': [10, 20, 20, 30, 40]
})

result = df.drop_duplicates(subset=['A'])
print(result)

   A  B   C
0  1  x  10
1  2  y  20
2  2  y  20
3  3  z  30
4  1  x  40

   A  B   C
0  1  x  10
2  2  y  20
3  3  z  30

   A  B   C
1  2  y  20
2  2  y  20
3  3  z  30

   A  B   C
0  1  x  10
1  2  y  20
3  3  z  30

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

Number of rows after drop_duplicates

How many rows remain after removing duplicates based on columns 'A' and 'B'?

Data Analysis Python

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 1],
    'B': ['x', 'y', 'y', 'z', 'x'],
    'C': [10, 20, 20, 30, 40]
})

result = df.drop_duplicates(subset=['A', 'B'])
print(len(result))

Attempts:

2 left

🔧 Debug

advanced

1:30remaining

Error when using drop_duplicates with inplace=True

What error will this code raise?

Data Analysis Python

import pandas as pd

df = pd.DataFrame({'A': [1, 1, 2], 'B': [3, 3, 4]})
df.drop_duplicates(inplace=True, subset=['A', 'B'])
print(df)

ANameError: name 'df' is not defined

BNo error, prints the DataFrame with duplicates removed

CAttributeError: 'NoneType' object has no attribute 'print'

DTypeError: drop_duplicates() got an unexpected keyword argument 'inplace'

Attempts:

2 left

🚀 Application

advanced

2:00remaining

Removing duplicates but keeping last occurrence

Which option correctly removes duplicates from DataFrame df based on column 'A' but keeps the last occurrence?

Data Analysis Python

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 1],
    'B': ['x', 'y', 'y', 'z', 'x'],
    'C': [10, 20, 20, 30, 40]
})

Adf.drop_duplicates(subset=['A'], keep='none')

Bdf.drop_duplicates(subset=['A'], keep='first')

Cdf.drop_duplicates(subset=['A'], keep='last')

Ddf.drop_duplicates(subset=['A'], keep=False)

Attempts:

2 left

🧠 Conceptual

expert

1:30remaining

Effect of drop_duplicates on index

After using drop_duplicates on a DataFrame, what happens to the index by default?

AThe original index values are preserved, including gaps from removed rows

BThe index is reset to a new continuous range starting from 0

CThe index is dropped and replaced with a default integer index without gaps

DThe index is converted to a MultiIndex based on duplicate columns

Attempts:

2 left