beginner

What does the drop_duplicates() function do in pandas?

It removes duplicate rows from a DataFrame, keeping only unique rows based on all or selected columns.

Click to reveal answer

beginner

How do you keep the first occurrence of duplicates when using drop_duplicates()?

By default, drop_duplicates() keeps the first occurrence and removes later duplicates. You can also set keep='first' explicitly.

Click to reveal answer

intermediate

What parameter do you use to remove duplicates based on specific columns only?

Use the subset parameter with a list of column names to consider only those columns when identifying duplicates.

Click to reveal answer

intermediate

How can you remove duplicates and modify the original DataFrame directly?

Set the parameter inplace=True in drop_duplicates() to remove duplicates without creating a new DataFrame.

Click to reveal answer

advanced

What happens if you set keep=False in drop_duplicates()?

All duplicates are removed, including the first occurrences, so only rows that are unique remain.

Click to reveal answer

What is the default behavior of drop_duplicates() in pandas?

ARemove all duplicates including first occurrences

BKeep the first occurrence of duplicates

CKeep the last occurrence of duplicates

DDo nothing

Which parameter lets you specify columns to check for duplicates?

Acolumns

Baxis

Cfilter

Dsubset

How do you remove duplicates and update the original DataFrame without creating a new one?

ASet <code>inplace=True</code>

BSet <code>keep=False</code>

CUse <code>drop()</code> instead

DUse <code>reset_index()</code>

What does keep=False do in drop_duplicates()?

ARemoves all duplicates including first occurrences

BKeeps the last duplicate

CKeeps all duplicates

DKeeps the first duplicate

If you want to remove duplicates based on columns 'A' and 'B' only, which code is correct?

Adf.drop_duplicates(keep=['A', 'B'])

Bdf.drop_duplicates(columns=['A', 'B'])

Cdf.drop_duplicates(subset=['A', 'B'])

Ddf.drop_duplicates(axis=['A', 'B'])

Explain how drop_duplicates() works and how you can control which duplicates to keep or remove.

Describe a real-life example where removing duplicates from data is important and how drop_duplicates() helps.