Challenge - 5 Problems

🎖️

Missing Data Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

What is the output of this code with missing data?

Consider the following pandas DataFrame with missing values. What will be the output of the code below?

Pandas

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, np.nan, 4], 'B': [5, np.nan, np.nan, 8]})
result = df['A'].mean()
print(round(result, 2))

A2.25

B2.33

Cnan

D3.0

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

How many rows remain after dropping missing data?

Given this DataFrame, how many rows remain after dropping all rows with any missing values?

Pandas

import pandas as pd
import numpy as np
df = pd.DataFrame({'X': [1, np.nan, 3, 4], 'Y': [np.nan, 2, 3, 4]})
df_clean = df.dropna()
print(len(df_clean))

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

What error does this code raise when filling missing data?

What error will this code produce when trying to fill missing values?

Pandas

import pandas as pd
df = pd.DataFrame({'A': [1, 2, None], 'B': ['x', 'y', None]})
df['B'] = df['B'].fillna(0)
print(df)

ATypeError

BKeyError

CValueError

DNo error, fills missing with 0

Attempts:

2 left

🧠 Conceptual

advanced

2:00remaining

Why is handling missing data important before modeling?

Which of the following best explains why handling missing data is important before building a machine learning model?

AModels ignore missing data automatically, so no need to handle it.

BMissing data always improves model accuracy.

CMissing data can cause models to crash or produce biased results.

DHandling missing data slows down model training without benefits.

Attempts:

2 left

🚀 Application

expert

3:00remaining

Which option produces the correct imputed DataFrame?

Given this DataFrame with missing values, which code snippet correctly fills missing numeric values with the column mean and missing categorical values with the mode?

Pandas

import pandas as pd
import numpy as np
df = pd.DataFrame({'num': [1, 2, np.nan, 4], 'cat': ['a', np.nan, 'a', 'b']})

df['num'].fillna(df['num'].mean(), inplace=True)
df['cat'].fillna(df['cat'].mode()[0], inplace=True)
print(df)

df.fillna(df.mean(), inplace=True)
df.fillna(df.mode().iloc[0], inplace=True)
print(df)

df['num'] = df['num'].fillna(df['num'].median())
df['cat'] = df['cat'].fillna(df['cat'].mode()[0])
print(df)

df.fillna(method='ffill', inplace=True)
print(df)

Attempts:

2 left