import pandas as pd df = pd.DataFrame({'A': [1, 2, 2, 3, 4, 4, 4], 'B': ['x', 'y', 'y', 'z', 'x', 'x', 'x']}) result = df.duplicated() print(result.tolist())
The duplicated() method by default marks all rows that are duplicates of a previous row as True. The first occurrence is marked False.
In the DataFrame, the second '2' in column A is a duplicate, so True. The second and third '4's are duplicates, so True.
import pandas as pd df = pd.DataFrame({'A': [5, 6, 5, 7, 6, 8, 5]}) duplicates = df.duplicated(subset=['A'], keep='last') count = duplicates.sum() print(count)
Values 5 and 6 appear multiple times. With keep='last', the first occurrences are marked True (duplicates), and the last occurrence is False.
For 5: positions 0 and 2 are duplicates (True), position 6 is last (False). For 6: position 1 is duplicate (True), position 4 is last (False). Total duplicates: 3.
import pandas as pd df = pd.DataFrame({'A': [1, 2, 3]}) result = df.duplicated(keep='middle') print(result)
The keep parameter only accepts 'first', 'last', or False. 'middle' is invalid and causes a ValueError.
import pandas as pd df = pd.DataFrame({'A': [1, 2, 2, 3], 'B': ['a', 'b', 'b', 'c']})
duplicated(keep=False) marks all duplicates as True. Using ~ inverts the mask to select only unique rows.
Option C correctly filters unique rows.
With keep=False, duplicated() marks all duplicates as True, including the first and last occurrences.
Using subset=['X'] limits the check to column 'X'.