Challenge - 5 Problems
Overfitting Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate2:00remaining
Why do we use a separate test set in machine learning?
Imagine you train a model on some data and then check how well it works on the same data. Why is it important to use a different set of data (test set) to evaluate the model?
Attempts:
2 left
❓ Predict Output
intermediate2:00remaining
What is the output of this model evaluation code?
Given the following code that splits data and evaluates a model, what will be printed?
ML Python
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score X, y = load_iris(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) model = DecisionTreeClassifier(random_state=42) model.fit(X_train, y_train) predictions_train = model.predict(X_train) predictions_test = model.predict(X_test) print(f"Train accuracy: {accuracy_score(y_train, predictions_train):.2f}") print(f"Test accuracy: {accuracy_score(y_test, predictions_test):.2f}")
Attempts:
2 left
❓ Metrics
advanced2:00remaining
Which metric best detects overfitting in classification?
You train a classifier and get very high accuracy on training data but much lower accuracy on test data. Which metric helps you best understand this overfitting problem?
Attempts:
2 left
❓ Model Choice
advanced2:00remaining
Which model choice helps reduce overfitting?
You want to prevent overfitting on a small dataset. Which model choice is best?
Attempts:
2 left
🔧 Debug
expert2:00remaining
Why does this evaluation code give misleading results?
Look at this code snippet. It trains and evaluates a model but the evaluation results are misleading. Why?
ML Python
from sklearn.datasets import load_digits from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score data = load_digits() X, y = data.data, data.target model = DecisionTreeClassifier() model.fit(X, y) predictions = model.predict(X) print(f"Accuracy: {accuracy_score(y, predictions):.2f}")
Attempts:
2 left