0
0
Pandasdata~5 mins

Counting duplicates in Pandas - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What does the duplicated() function in pandas do?
It returns a Boolean Series indicating whether each row is a duplicate of a previous row. <br>True means the row is a duplicate, False means it is unique so far.
Click to reveal answer
beginner
How can you count the total number of duplicate rows in a DataFrame?
Use df.duplicated().sum(). <br>This counts how many rows are duplicates.
Click to reveal answer
intermediate
What does the keep parameter in duplicated() control?
It controls which duplicates to mark as True: <br>- 'first' marks duplicates except the first occurrence <br>- 'last' marks duplicates except the last occurrence <br>- False marks all duplicates as True
Click to reveal answer
intermediate
How do you count duplicates based on specific columns only?
Pass the column names to the subset parameter in duplicated(). <br>Example: df.duplicated(subset=['col1', 'col2']).sum()
Click to reveal answer
beginner
What is the difference between duplicated() and drop_duplicates()?
duplicated() marks duplicate rows with True/False. <br>drop_duplicates() removes duplicate rows from the DataFrame.
Click to reveal answer
Which pandas function returns a Boolean Series marking duplicate rows?
Aduplicated()
Bdrop_duplicates()
Ccount_duplicates()
Dis_duplicate()
How do you count the number of duplicate rows in a DataFrame df?
Adf.count_duplicates()
Bdf.drop_duplicates().count()
Cdf.duplicated().sum()
Ddf.duplicated().count()
What does df.duplicated(keep=False) do?
AMarks all duplicates as True, including first occurrences
BMarks only the first duplicate as True
CMarks only the last duplicate as True
DRemoves duplicates from df
To check duplicates based on columns 'A' and 'B' only, which is correct?
Adf.duplicated(['A', 'B'])
Bdf.duplicated(columns=['A', 'B'])
Cdf.duplicated(cols=['A', 'B'])
Ddf.duplicated(subset=['A', 'B'])
Which function removes duplicate rows from a DataFrame?
Aduplicated()
Bdrop_duplicates()
Cremove_duplicates()
Ddelete_duplicates()
Explain how to find and count duplicate rows in a pandas DataFrame.
Think about marking duplicates first, then counting them.
You got /4 concepts.
    Describe the difference between duplicated() and drop_duplicates() in pandas.
    One marks duplicates, the other removes them.
    You got /4 concepts.