You have a machine learning model deployed for predicting house prices. Over time, the model's accuracy drops significantly. What is the most likely reason you need to retrain the model?
Think about what happens when the world changes but the model stays the same.
When the data changes over time (called data drift), the model's predictions become less accurate. Retraining with new data helps the model adapt.
What will be the output of the following Python code snippet that simulates incremental retraining?
class Model: def __init__(self): self.data_count = 0 def train(self, new_data): self.data_count += len(new_data) def predict(self): return self.data_count model = Model() model.train([1, 2, 3]) model.train([4, 5]) print(model.predict())
Count how many items were added in total.
The model trains twice, first with 3 items, then with 2 items. The total count is 5, which is returned by predict.
You have a model that receives continuous streaming data. Which retraining strategy is best to keep the model updated without retraining from scratch every time?
Think about efficiency and keeping the model fresh with new data.
Incremental learning updates the model with new data without full retraining, making it efficient for streaming data.
After retraining a classification model, which metric change best indicates the retraining improved the model?
Look for improvements on data the model hasn't seen before.
Improved accuracy and lower loss on validation data show the model generalizes better after retraining.
Consider this retraining code snippet. What is the main issue causing overly optimistic validation results?
from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression X = [[i] for i in range(100)] y = [0]*50 + [1]*50 # Split data X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42) # Retrain model model = LogisticRegression() model.fit(X_train + X_val, y_train + y_val) # Evaluate score = model.score(X_val, y_val) print(f"Validation accuracy: {score}")
Check if validation data is used during training.
The code trains the model on both training and validation data, so validation accuracy is not a true measure of performance.