Challenge - 5 Problems

🎖️

Duplicate Detection Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of duplicated() with default parameters

What is the output of the following code snippet?

Pandas

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 2, 3, 4, 4, 4], 'B': ['x', 'y', 'y', 'z', 'x', 'x', 'x']})
result = df.duplicated()
print(result.tolist())

A[True, False, True, False, True, False, True]

B[False, True, True, False, True, True, True]

C[False, False, False, False, False, False, False]

D[False, False, True, False, False, True, True]

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Count of duplicated rows with subset and keep='last'

Given the DataFrame below, how many rows are marked as duplicates when using subset=['A'] and keep='last'?

Pandas

import pandas as pd

df = pd.DataFrame({'A': [5, 6, 5, 7, 6, 8, 5]})
duplicates = df.duplicated(subset=['A'], keep='last')
count = duplicates.sum()
print(count)

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in duplicated() usage

What error will this code raise?

Pandas

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3]})
result = df.duplicated(keep='middle')
print(result)

AValueError: keep must be either 'first', 'last', or False

BTypeError: duplicated() got an unexpected keyword argument 'keep'

CSyntaxError: invalid syntax

DNo error, prints a boolean Series

Attempts:

2 left

🚀 Application

advanced

2:00remaining

Filter unique rows using duplicated()

Which code snippet correctly filters the DataFrame to keep only unique rows (no duplicates) based on all columns?

Pandas

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 2, 3], 'B': ['a', 'b', 'b', 'c']})

Adf[df.duplicated(keep=False)]

Bdf[df.duplicated(keep='first')]

Cdf[~df.duplicated(keep=False)]

Ddf[~df.duplicated(keep='last')]

Attempts:

2 left

🧠 Conceptual

expert

3:00remaining

Understanding duplicated() behavior with subset and keep=False

Consider a DataFrame with columns 'X' and 'Y'. If you run df.duplicated(subset=['X'], keep=False), what does the output represent?

AMarks only the first occurrence of each duplicate value in column 'X' as True.

BMarks all rows that have duplicate values in column 'X', including the first and last occurrences.

CMarks only the last occurrence of each duplicate value in column 'X' as True.

DMarks no rows as duplicates because keep=False disables duplicate detection.

Attempts:

2 left