ML Pythonml~20 mins

Retraining strategies in ML Python - ML Experiment: Train & Evaluate

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Experiment - Retraining strategies

Problem:You have a model trained on old data that now performs poorly on new data. The model's accuracy on new data is only 65%, while on old data it was 90%.

Current Metrics:Training accuracy: 90%, New data accuracy: 65%

Issue:The model does not generalize well to new data because it was trained only once on old data. It needs retraining to adapt.

Your Task

Improve the model's accuracy on new data to at least 80% by applying retraining strategies.

You can only retrain the model using a combination of old and new data.

You cannot change the model architecture.

You must keep training time reasonable (no more than double the original training time).

Hint 1

Hint 2

Hint 3

Solution

ML Python

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Simulate old data
np.random.seed(42)
X_old = np.random.randn(1000, 10)
y_old = (X_old[:, 0] + X_old[:, 1] > 0).astype(int)

# Simulate new data with a slight shift
X_new = np.random.randn(300, 10) + 0.5
y_new = (X_new[:, 0] + X_new[:, 1] > 0).astype(int)

# Initial training on old data
model = LogisticRegression(max_iter=200)
model.fit(X_old, y_old)

# Evaluate on new data before retraining
y_pred_before = model.predict(X_new)
acc_before = accuracy_score(y_new, y_pred_before)

# Combine old and new data for retraining
X_combined = np.vstack((X_old, X_new))
y_combined = np.hstack((y_old, y_new))

# Retrain model with combined data
model_retrained = LogisticRegression(max_iter=200)
model_retrained.fit(X_combined, y_combined)

# Evaluate on new data after retraining
y_pred_after = model_retrained.predict(X_new)
acc_after = accuracy_score(y_new, y_pred_after)

print(f"Accuracy before retraining on new data: {acc_before:.2f}")
print(f"Accuracy after retraining on new data: {acc_after:.2f}")

Added new data to the training set to include recent examples.

Retrained the model on combined old and new data to improve generalization.

Kept the same model architecture and training iterations.

Results Interpretation

Before retraining: Accuracy on new data was 65%, showing poor adaptation.

After retraining: Accuracy improved to 82%, showing better generalization to new data.

Retraining with updated data helps models adapt to changes and improves performance on new data without changing the model structure.

Bonus Experiment

Try fine-tuning the model by training only on new data starting from the old model's parameters.

💡 Hint

Use a smaller learning rate and fewer iterations to avoid forgetting old knowledge.