beginner

What does the drop_duplicates() function do in pandas?

It removes duplicate rows from a DataFrame, keeping only unique rows.

Click to reveal answer

beginner

How can you keep the last occurrence of a duplicate row using drop_duplicates()?

Use the parameter keep='last' to keep the last duplicate and remove earlier ones.

Click to reveal answer

intermediate

What parameter do you use to remove duplicates based on specific columns only?

Use the subset parameter with a list of column names to check duplicates only on those columns.

Click to reveal answer

intermediate

What happens if you set inplace=True in drop_duplicates()?

The DataFrame is modified directly without creating a new copy, so duplicates are removed in the original DataFrame.

Click to reveal answer

beginner

Why is removing duplicates important in data analysis?

Duplicates can cause incorrect analysis results, like counting the same data multiple times or biasing statistics.

Click to reveal answer

What is the default behavior of drop_duplicates() when duplicates are found?

ARemoves all duplicates including the first

BKeeps the last occurrence and removes the rest

CKeeps the first occurrence and removes the rest

DKeeps all duplicates

Which parameter lets you specify columns to check for duplicates?

Acolumns

Bsubset

Cfilter

Daxis

What does setting inplace=True do in drop_duplicates()?

AModifies the original DataFrame directly

BDuplicates are marked but not removed

CDoes nothing

DReturns a new DataFrame without duplicates

If you want to keep the last duplicate row instead of the first, which argument do you use?

Akeep='first'

Bkeep='none'

Ckeep='all'

Dkeep='last'

Why should duplicates be removed before analysis?

ATo avoid biased or incorrect results

BTo reduce file size only

CDuplicates improve accuracy

DDuplicates are always needed

Explain how to use drop_duplicates() to remove duplicate rows based on specific columns and keep the last occurrence.

Describe why removing duplicates is important in data analysis and what problems duplicates can cause.