Recall & Review
beginner
What does the 'keep' parameter do in pandas' drop_duplicates() method?
The 'keep' parameter decides which duplicate to keep: 'first' keeps the first occurrence, 'last' keeps the last occurrence, and False drops all duplicates.
Click to reveal answer
intermediate
In pandas, what happens if you set keep=False in drop_duplicates()?
All rows that have duplicates are removed, so only unique rows remain with no duplicates at all.
Click to reveal answer
beginner
How does keep='first' differ from keep='last' in drop_duplicates()?
keep='first' keeps the first occurrence of each duplicate group and removes the rest, while keep='last' keeps the last occurrence and removes earlier ones.
Click to reveal answer
beginner
True or False: Using keep=False in drop_duplicates() will keep one row from each duplicate group.
False. keep=False removes all duplicates, so no rows from duplicate groups are kept.
Click to reveal answer
intermediate
Why might you use keep=False instead of 'first' or 'last' when removing duplicates?
To ensure that only completely unique rows remain, removing all rows that appear more than once, which can be important for clean data analysis.
Click to reveal answer
What does keep='first' do in pandas drop_duplicates()?
✗ Incorrect
keep='first' keeps the first row of each duplicate group and removes the others.
If you want to remove all rows that have duplicates, which keep option should you use?
✗ Incorrect
keep=False removes all rows that have duplicates, leaving only unique rows.
What is the default value of the keep parameter in drop_duplicates()?
✗ Incorrect
By default, keep='first' in drop_duplicates(), so the first occurrence is kept.
Which keep option keeps the last duplicate row?
✗ Incorrect
keep='last' keeps the last occurrence of each duplicate group.
What happens if you set keep=False and there are no duplicates in the data?
✗ Incorrect
If there are no duplicates, keep=False removes nothing because no rows are duplicated.
Explain the difference between keep='first', keep='last', and keep=False in pandas drop_duplicates().
Think about which rows remain after duplicates are removed.
You got /3 concepts.
Describe a situation where using keep=False would be better than keep='first' or 'last'.
Consider data cleaning for accurate results.
You got /3 concepts.