What is Model Drift: Definition, Examples, and When to Use
model no longer predicts accurately because the real-world data distribution has shifted.How It Works
Imagine you trained a model to predict if emails are spam based on patterns in the words used. Over time, spammers change their tactics and use different words or styles. This change in the email content is like the data shifting, and the model's old rules don't work as well anymore. This is called model drift.
Model drift happens because the world changes. The data your model learned from is no longer the same as the data it sees when making predictions. Just like how a weather forecast model trained on last year's weather might not predict well if climate patterns change, machine learning models can lose accuracy if the input data changes.
Example
This example shows a simple model trained on data with one pattern, then tested on new data with a shifted pattern, causing accuracy to drop.
import numpy as np from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score # Training data: feature mostly 0 or 1, label same as feature X_train = np.array([[0], [0], [1], [1]]) y_train = np.array([0, 0, 1, 1]) # Train model model = LogisticRegression().fit(X_train, y_train) # Test data with same pattern (no drift) X_test_same = np.array([[0], [1], [0], [1]]) y_test_same = np.array([0, 1, 0, 1]) # Test data with drift: feature flipped (0->1, 1->0) X_test_drift = np.array([[1], [0], [1], [0]]) y_test_drift = np.array([0, 1, 0, 1]) # Predictions and accuracy pred_same = model.predict(X_test_same) acc_same = accuracy_score(y_test_same, pred_same) pred_drift = model.predict(X_test_drift) acc_drift = accuracy_score(y_test_drift, pred_drift) print(f"Accuracy without drift: {acc_same:.2f}") print(f"Accuracy with drift: {acc_drift:.2f}")
When to Use
Understanding model drift is important when your model works in a changing environment. For example, fraud detection models must adapt as fraudsters change tactics. Customer behavior models need updates as preferences evolve. Monitoring for drift helps keep models accurate and reliable.
Use drift detection when your model's input data or environment can change over time. This helps you know when to retrain or update your model to keep good performance.
Key Points
- Model drift means the model's accuracy drops because data changes.
- It happens when the real-world data distribution shifts from training data.
- Detecting drift helps decide when to retrain models.
- Common in areas like finance, healthcare, and marketing.