0
0
ML Pythonprogramming~20 mins

Why proper evaluation prevents overfitting in ML Python - Challenge Your Understanding

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Overfitting Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why do we use a separate test set in machine learning?

Imagine you train a model on some data and then check how well it works on the same data. Why is it important to use a different set of data (test set) to evaluate the model?

ABecause the test set helps us see how the model performs on new, unseen data, preventing us from thinking it works better than it really does.
BBecause the test set is used to train the model faster by giving it extra examples.
CBecause the test set contains only easy examples that make the model look good.
DBecause the test set is used to tune the model’s parameters during training.
Attempts:
2 left
Predict Output
intermediate
2:00remaining
What is the output of this model evaluation code?

Given the following code that splits data and evaluates a model, what will be printed?

ML Python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)
predictions_train = model.predict(X_train)
predictions_test = model.predict(X_test)
print(f"Train accuracy: {accuracy_score(y_train, predictions_train):.2f}")
print(f"Test accuracy: {accuracy_score(y_test, predictions_test):.2f}")
A
Train accuracy: 1.00
Test accuracy: 0.98
B
Train accuracy: 0.98
Test accuracy: 1.00
C
Train accuracy: 0.50
Test accuracy: 0.50
D
Train accuracy: 0.85
Test accuracy: 0.85
Attempts:
2 left
Metrics
advanced
2:00remaining
Which metric best detects overfitting in classification?

You train a classifier and get very high accuracy on training data but much lower accuracy on test data. Which metric helps you best understand this overfitting problem?

ARecall on test data only
BPrecision on training data only
CF1 score on training data only
DDifference between training accuracy and test accuracy
Attempts:
2 left
Model Choice
advanced
2:00remaining
Which model choice helps reduce overfitting?

You want to prevent overfitting on a small dataset. Which model choice is best?

AUse the training data as the test data
BUse a simpler model with fewer parameters
CUse no validation and train longer
DUse a very deep neural network with many layers
Attempts:
2 left
🔧 Debug
expert
2:00remaining
Why does this evaluation code give misleading results?

Look at this code snippet. It trains and evaluates a model but the evaluation results are misleading. Why?

ML Python
from sklearn.datasets import load_digits
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

data = load_digits()
X, y = data.data, data.target
model = DecisionTreeClassifier()
model.fit(X, y)
predictions = model.predict(X)
print(f"Accuracy: {accuracy_score(y, predictions):.2f}")
ABecause accuracy_score requires test data, not training data.
BBecause the model was not trained at all before prediction.
CBecause the model is evaluated on the same data it was trained on, causing overfitting to be hidden.
DBecause the dataset is too small to train any model.
Attempts:
2 left