Recall & Review
beginner
What does the
drop_duplicates() function do in pandas?It removes duplicate rows from a DataFrame, keeping only unique rows.
Click to reveal answer
beginner
How can you keep the last occurrence of a duplicate row using
drop_duplicates()?Use the parameter
keep='last' to keep the last duplicate and remove earlier ones.Click to reveal answer
intermediate
What parameter do you use to remove duplicates based on specific columns only?
Use the
subset parameter with a list of column names to check duplicates only on those columns.Click to reveal answer
intermediate
What happens if you set
inplace=True in drop_duplicates()?The DataFrame is modified directly without creating a new copy, so duplicates are removed in the original DataFrame.
Click to reveal answer
beginner
Why is removing duplicates important in data analysis?
Duplicates can cause incorrect analysis results, like counting the same data multiple times or biasing statistics.
Click to reveal answer
What is the default behavior of
drop_duplicates() when duplicates are found?✗ Incorrect
By default,
drop_duplicates() keeps the first occurrence of each duplicate row.Which parameter lets you specify columns to check for duplicates?
✗ Incorrect
The
subset parameter takes a list of columns to consider when identifying duplicates.What does setting
inplace=True do in drop_duplicates()?✗ Incorrect
Setting
inplace=True changes the original DataFrame by removing duplicates without returning a new one.If you want to keep the last duplicate row instead of the first, which argument do you use?
✗ Incorrect
Use
keep='last' to keep the last occurrence of duplicates.Why should duplicates be removed before analysis?
✗ Incorrect
Duplicates can skew results by counting the same data multiple times, so removing them improves accuracy.
Explain how to use
drop_duplicates() to remove duplicate rows based on specific columns and keep the last occurrence.Think about which columns to check and which duplicate to keep.
You got /4 concepts.
Describe why removing duplicates is important in data analysis and what problems duplicates can cause.
Consider how duplicates affect statistics and decisions.
You got /4 concepts.