Challenge - 5 Problems

🎖️

Model Comparison Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Understanding Cross-Validation Purpose

Why do we use cross-validation when comparing machine learning models?

ATo estimate model performance on unseen data by splitting data into multiple train-test sets

BTo increase the training data size by duplicating samples

CTo reduce the number of features in the dataset

DTo speed up the training process by using smaller datasets

Attempts:

2 left

❓ Metrics

intermediate

2:00remaining

Choosing the Best Metric for Imbalanced Data

You have a classification problem with very imbalanced classes. Which metric is best to compare models fairly?

AF1-score

BMean Squared Error

CPrecision

DAccuracy

Attempts:

2 left

❓ Predict Output

advanced

2:00remaining

Output of Model Comparison Using Cross-Validation Scores

What is the output of the following Python code?

ML Python

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score

iris = load_iris()
X, y = iris.data, iris.target
model = DecisionTreeClassifier(random_state=42)
scores = cross_val_score(model, X, y, cv=5)
print(round(scores.mean(), 2))

A0.75

B0.85

C0.95

D0.65

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

Effect of Increasing Number of Folds in Cross-Validation

What is the main effect of increasing the number of folds (cv) in k-fold cross-validation?

AIt increases bias and decreases variance of the performance estimate

BIt decreases bias and increases variance of the performance estimate

CIt decreases both bias and variance of the performance estimate

DIt increases both bias and variance of the performance estimate

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Identifying the Error in Model Comparison Code

What error does the following code raise when trying to compare two models using cross-validation?

ML Python

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score

iris = load_iris()
X, y = iris.data, iris.target

model1 = LogisticRegression(max_iter=200)
model2 = DecisionTreeClassifier()

scores1 = cross_val_score(model1, X, y, cv=5)
scores2 = cross_val_score(model2, X, y, cv=5)

print(scores1.mean())
print(scores2.mean())

ANameError because model2 is not defined

BTypeError because cross_val_score expects a list of models

CValueError because iris dataset has missing values

DConvergenceWarning due to LogisticRegression default max_iter too low

Attempts:

2 left