0
0
Pandasdata~20 mins

drop_duplicates() for removal in Pandas - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Drop Duplicates Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of drop_duplicates() with subset and keep parameters

What is the output DataFrame after running the following code?

Pandas
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 3, 3],
    'B': ['x', 'y', 'y', 'z', 'z', 'z'],
    'C': [10, 20, 20, 30, 30, 30]
})

result = df.drop_duplicates(subset=['A'], keep='last')
print(result)
A
   A  B   C
0  1  x  10
2  2  y  20
5  3  z  30
B
   A  B   C
0  1  x  10
1  2  y  20
3  3  z  30
C
   A  B   C
0  1  x  10
1  2  y  20
2  2  y  20
3  3  z  30
4  3  z  30
5  3  z  30
D
   A  B   C
0  1  x  10
3  3  z  30
Attempts:
2 left
💡 Hint

Remember that subset=['A'] means duplicates are checked only on column 'A'. The keep='last' parameter keeps the last occurrence.

data_output
intermediate
1:30remaining
Number of rows after drop_duplicates() with keep=False

Given the DataFrame below, how many rows remain after removing all duplicates (no rows kept) based on columns 'A' and 'B'?

Pandas
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 3, 3],
    'B': ['x', 'y', 'y', 'z', 'z', 'w'],
    'C': [10, 20, 20, 30, 30, 40]
})

result = df.drop_duplicates(subset=['A', 'B'], keep=False)
print(len(result))
A2
B3
C4
D5
Attempts:
2 left
💡 Hint

Using keep=False removes all rows that have duplicates in the specified subset.

🔧 Debug
advanced
1:30remaining
Identify the error in drop_duplicates() usage

What error will this code raise?

Pandas
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

result = df.drop_duplicates(subset='A,B')
print(result)
ATypeError: unhashable type: 'list'
BNo error, prints the original DataFrame
CValueError: subset must be a list-like of column labels
DKeyError: 'A,B'
Attempts:
2 left
💡 Hint

Check the type and format of the subset argument.

🚀 Application
advanced
2:00remaining
Remove duplicates keeping first occurrence per group

You have a DataFrame with sales data. You want to keep only the first sale per customer. Which code snippet achieves this?

Pandas
import pandas as pd

df = pd.DataFrame({
    'customer_id': [101, 102, 101, 103, 102],
    'sale_amount': [200, 150, 300, 400, 100],
    'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']
})
Adf.drop_duplicates(subset=['customer_id', 'sale_amount'], keep='last')
Bdf.drop_duplicates(subset=['sale_amount'], keep='last')
Cdf.drop_duplicates(subset=['customer_id'], keep='first')
Ddf.drop_duplicates(subset=['date'], keep=False)
Attempts:
2 left
💡 Hint

Think about which column identifies customers and which occurrence to keep.

🧠 Conceptual
expert
1:30remaining
Effect of drop_duplicates() on DataFrame index

After using drop_duplicates() on a DataFrame, what happens to the index of the resulting DataFrame?

AThe index is converted to a MultiIndex based on duplicate columns.
BThe original index values are preserved, including gaps from dropped rows.
CThe index is reset to a continuous range starting at 0 automatically.
DThe index is dropped and replaced with a default integer index only if <code>reset_index()</code> is called.
Attempts:
2 left
💡 Hint

Consider what happens to row labels when rows are removed but no explicit index reset is done.