Recall & Review
beginner
What does the
duplicated() function in pandas do?It returns a Boolean Series indicating whether each row is a duplicate of a previous row. <br>True means the row is a duplicate, False means it is unique so far.
Click to reveal answer
beginner
How can you count the total number of duplicate rows in a DataFrame?
Use
df.duplicated().sum(). <br>This counts how many rows are duplicates.Click to reveal answer
intermediate
What does the
keep parameter in duplicated() control?It controls which duplicates to mark as True: <br>-
'first' marks duplicates except the first occurrence <br>- 'last' marks duplicates except the last occurrence <br>- False marks all duplicates as TrueClick to reveal answer
intermediate
How do you count duplicates based on specific columns only?
Pass the column names to the
subset parameter in duplicated(). <br>Example: df.duplicated(subset=['col1', 'col2']).sum()Click to reveal answer
beginner
What is the difference between
duplicated() and drop_duplicates()?duplicated() marks duplicate rows with True/False. <br>drop_duplicates() removes duplicate rows from the DataFrame.Click to reveal answer
Which pandas function returns a Boolean Series marking duplicate rows?
✗ Incorrect
duplicated() returns True for duplicate rows, False otherwise.How do you count the number of duplicate rows in a DataFrame
df?✗ Incorrect
df.duplicated() returns a Boolean Series; summing it counts duplicates.What does
df.duplicated(keep=False) do?✗ Incorrect
keep=False marks all duplicates as True, not keeping any unique.To check duplicates based on columns 'A' and 'B' only, which is correct?
✗ Incorrect
The
subset parameter specifies columns to check for duplicates.Which function removes duplicate rows from a DataFrame?
✗ Incorrect
drop_duplicates() returns a DataFrame without duplicate rows.Explain how to find and count duplicate rows in a pandas DataFrame.
Think about marking duplicates first, then counting them.
You got /4 concepts.
Describe the difference between duplicated() and drop_duplicates() in pandas.
One marks duplicates, the other removes them.
You got /4 concepts.