0
0
Pandasdata~5 mins

duplicated() for finding duplicates in Pandas - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What does the duplicated() function in pandas do?
It returns a Boolean Series indicating whether each row is a duplicate of a previous row in the DataFrame.
Click to reveal answer
beginner
How can you use duplicated() to find all duplicate rows except the first occurrence?
By default, duplicated() marks duplicates as True except for the first occurrence, which is False.
Click to reveal answer
intermediate
What parameter of duplicated() controls which duplicates are marked as True?
The keep parameter controls this. It can be 'first' (default), 'last', or False to mark all duplicates as True.
Click to reveal answer
intermediate
How do you find duplicates based on specific columns using duplicated()?
Use the subset parameter to specify columns to check for duplicates instead of the entire row.
Click to reveal answer
intermediate
What is the difference between duplicated() and drop_duplicates()?
duplicated() returns a Boolean mask showing duplicates, while drop_duplicates() returns a DataFrame with duplicates removed.
Click to reveal answer
What does df.duplicated() return?
AA DataFrame with duplicates removed
BThe count of duplicate rows
CA Boolean Series marking duplicate rows except the first occurrence
DThe unique rows only
Which keep parameter value marks all duplicates as True in duplicated()?
A'last'
BFalse
C'first'
DNone
How do you check duplicates based on only some columns?
AUse <code>unique()</code> on those columns
BUse <code>drop_duplicates()</code> without parameters
CUse <code>groupby()</code> only
DUse the <code>subset</code> parameter in <code>duplicated()</code>
What does df.duplicated(keep='last') do?
AMarks duplicates as True except the last occurrence
BMarks duplicates as True except the first occurrence
CMarks all duplicates as True
DRemoves duplicates from DataFrame
Which function returns a DataFrame with duplicates removed?
A<code>drop_duplicates()</code>
B<code>duplicated()</code>
C<code>unique()</code>
D<code>isnull()</code>
Explain how to use duplicated() to find duplicate rows in a DataFrame and how to customize which duplicates are marked.
Think about which rows are marked True and how to change that.
You got /4 concepts.
    Describe the difference between duplicated() and drop_duplicates() and when you might use each.
    Consider output type and purpose.
    You got /4 concepts.