Challenge - 5 Problems

🎖️

Drop Duplicates Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of drop_duplicates() with subset and keep parameters

What is the output DataFrame after running the following code?

Pandas

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 3, 3],
    'B': ['x', 'y', 'y', 'z', 'z', 'z'],
    'C': [10, 20, 20, 30, 30, 30]
})

result = df.drop_duplicates(subset=['A'], keep='last')
print(result)

   A  B   C
0  1  x  10
2  2  y  20
5  3  z  30

   A  B   C
0  1  x  10
1  2  y  20
3  3  z  30

   A  B   C
0  1  x  10
1  2  y  20
2  2  y  20
3  3  z  30
4  3  z  30
5  3  z  30

   A  B   C
0  1  x  10
3  3  z  30

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

Number of rows after drop_duplicates() with keep=False

Given the DataFrame below, how many rows remain after removing all duplicates (no rows kept) based on columns 'A' and 'B'?

Pandas

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 3, 3],
    'B': ['x', 'y', 'y', 'z', 'z', 'w'],
    'C': [10, 20, 20, 30, 30, 40]
})

result = df.drop_duplicates(subset=['A', 'B'], keep=False)
print(len(result))

Attempts:

2 left

🔧 Debug

advanced

1:30remaining

Identify the error in drop_duplicates() usage

What error will this code raise?

Pandas

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

result = df.drop_duplicates(subset='A,B')
print(result)

ATypeError: unhashable type: 'list'

BNo error, prints the original DataFrame

CValueError: subset must be a list-like of column labels

DKeyError: 'A,B'

Attempts:

2 left

🚀 Application

advanced

2:00remaining

Remove duplicates keeping first occurrence per group

You have a DataFrame with sales data. You want to keep only the first sale per customer. Which code snippet achieves this?

Pandas

import pandas as pd

df = pd.DataFrame({
    'customer_id': [101, 102, 101, 103, 102],
    'sale_amount': [200, 150, 300, 400, 100],
    'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']
})

Adf.drop_duplicates(subset=['customer_id', 'sale_amount'], keep='last')

Bdf.drop_duplicates(subset=['sale_amount'], keep='last')

Cdf.drop_duplicates(subset=['customer_id'], keep='first')

Ddf.drop_duplicates(subset=['date'], keep=False)

Attempts:

2 left

🧠 Conceptual

expert

1:30remaining

Effect of drop_duplicates() on DataFrame index

After using drop_duplicates() on a DataFrame, what happens to the index of the resulting DataFrame?

AThe index is converted to a MultiIndex based on duplicate columns.

BThe original index values are preserved, including gaps from dropped rows.

CThe index is reset to a continuous range starting at 0 automatically.

DThe index is dropped and replaced with a default integer index only if <code>reset_index()</code> is called.

Attempts:

2 left