Challenge - 5 Problems

🎖️

Data Cleaning Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

1:30remaining

Why does data cleaning take so much time?

Which of the following reasons best explains why data cleaning often consumes the majority of time in data analysis?

AData cleaning is mostly about collecting new data from external sources.

BData cleaning involves writing complex machine learning models that take a long time to train.

CRaw data often contains errors, missing values, and inconsistencies that require careful fixing before analysis.

DData cleaning requires creating visualizations to understand the data patterns.

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Output of cleaning missing values in a DataFrame

Given the following Python code that replaces missing values in a DataFrame column with the column mean, what is the resulting DataFrame?

Data Analysis Python

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, np.nan, 4, 5]})
mean_val = df['A'].mean()
df['A'] = df['A'].fillna(mean_val)
print(df)

Attempts:

2 left

🔧 Debug

advanced

1:30remaining

Identify the error in this data cleaning code

What error will this Python code raise when trying to drop rows with missing values?

import pandas as pd
df = pd.DataFrame({'A': [1, None, 3], 'B': [4, 5, None]})
df.dropna(inplace=True, axis=2)

AValueError: No axis named 2 for object type DataFrame

BNo error, code runs successfully

CKeyError: 'axis'

DTypeError: dropna() got an unexpected keyword argument 'inplace'

Attempts:

2 left

🚀 Application

advanced

2:00remaining

Choosing the best method to handle outliers

You have a dataset with some extreme outlier values in a numeric column. Which method is best to reduce their impact before analysis?

AUse winsorization to cap extreme values at a percentile threshold.

BReplace outliers with the mean value of the column.

CRemove the outlier rows completely from the dataset.

DIgnore outliers and proceed with analysis.

Attempts:

2 left

❓ visualization

expert

2:30remaining

Interpreting a data cleaning visualization

You see a boxplot of a numeric column before and after cleaning. The 'before' plot shows many points outside whiskers, the 'after' plot shows fewer. What does this indicate?

AThe cleaning duplicated the data points outside whiskers.

BThe cleaning added more outliers to the data.

CThe cleaning changed the data type from numeric to categorical.

DThe cleaning removed or capped many outliers, making data distribution tighter.

Attempts:

2 left