Which of the following best describes overfitting in machine learning?
Think about when a model learns too much detail from the training data.
Overfitting happens when a model learns the training data too well, including noise, so it fails to generalize to new data.
What is the output of the following Python code that fits a simple linear regression and predicts a value?
from sklearn.linear_model import LinearRegression import numpy as np X = np.array([[1], [2], [3], [4], [5]]) y = np.array([2, 4, 6, 8, 10]) model = LinearRegression().fit(X, y) prediction = model.predict([[6]]) print(round(prediction[0], 2))
Look at the pattern in y relative to X.
The data shows y = 2 * X, so predicting for X=6 gives 12.
Given a binary classification confusion matrix below, what is the accuracy?
[[50, 10], [5, 35]]
Accuracy = (True Positives + True Negatives) / Total samples.
Accuracy = (50 + 35) / (50 + 10 + 5 + 35) = 85 / 100 = 0.85 or 85%.
Which statement best describes the role of the root node in a decision tree?
Think about where the tree starts splitting data.
The root node is the top node where the first split happens, usually on the most important feature.
What error will the following code raise when run?
from sklearn.cluster import KMeans import numpy as np X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]) model = KMeans(n_clusters=2, random_state=0) model.fit(X) print(model.predict([[0, 0]]))
Check if KMeans has a predict method and if input shapes are correct.
KMeans in sklearn has a predict method. The input shape is correct. So no error occurs and output is cluster label 0.