Challenge - 5 Problems

🎖️

Missing Values Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Why is it important to handle missing values before training a machine learning model?

Choose the best reason why missing values must be handled before training a model.

AMissing values always improve model accuracy by adding randomness.

BMissing values can cause errors or unexpected behavior in many machine learning algorithms.

CModels automatically ignore missing values, so handling them is optional.

DMissing values reduce the size of the dataset, which speeds up training.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

What is the output of this code that fills missing values with the column mean?

Given the following code, what will be the resulting DataFrame?

ML Python

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [4, 5, np.nan]})
df_filled = df.fillna(df.mean())
print(df_filled)

     A    B
0  1.0  4.0
1  2.0  5.0
2  3.0  4.5

     A    B
0  1.0  4.0
1  NaN  5.0
2  3.0  NaN

     A    B
0  1.0  4.0
1  2.0  NaN
2  3.0  4.5

     A    B
0  1.0  4.0
1  3.0  5.0
2  3.0  4.5

Attempts:

2 left

❓ Model Choice

advanced

2:00remaining

Which model type is most robust to missing values without imputation?

Choose the model that can handle missing values internally without needing to fill them first.

ADecision Trees

BLinear Regression

CK-Nearest Neighbors

DSupport Vector Machines

Attempts:

2 left

❓ Metrics

advanced

2:00remaining

How does improper handling of missing values affect model evaluation metrics?

What is the most likely effect on accuracy if missing values are dropped from the test set but not from the training set?

AAccuracy will be unaffected because missing values are only in training data.

BAccuracy will be artificially low because the model sees fewer test samples.

CAccuracy will be exactly the same as if missing values were imputed.

DAccuracy will be artificially high because the test set is cleaner than training data.

Attempts:

2 left

🔧 Debug

expert

2:00remaining

What error does this code raise when imputing missing values with scikit-learn's SimpleImputer?

Examine the code below and select the error it produces when run.

ML Python

from sklearn.impute import SimpleImputer
import numpy as np

X = np.array([[1, 2], [np.nan, 3], [7, 6]])
imputer = SimpleImputer(strategy='median')
X_imputed = imputer.fit_transform(X)
print(X_imputed)

AValueError: Cannot use median strategy with non-numeric data

BTypeError: 'SimpleImputer' object is not callable

No error; output is [[1. 2.]
 [4. 3.]
 [7. 6.]]

DAttributeError: 'numpy.ndarray' object has no attribute 'fit_transform'

Attempts:

2 left