Model drift detection helps us know when a machine learning model stops working well because things have changed. It keeps the model useful over time.
0
0
Model drift detection in ML Python
Introduction
When a model is used in real life and data changes over time, like weather or sales data.
When you want to keep a recommendation system accurate as user preferences change.
When monitoring fraud detection models because fraud patterns evolve.
When deploying models in production to catch performance drops early.
When retraining models regularly to decide if retraining is needed.
Syntax
ML Python
from sklearn.metrics import accuracy_score # Compare old and new data predictions old_accuracy = accuracy_score(y_true_old, y_pred_old) new_accuracy = accuracy_score(y_true_new, y_pred_new) # Check if accuracy dropped significantly if new_accuracy < old_accuracy - threshold: print('Model drift detected')
This example uses accuracy to detect drift by comparing old and new data performance.
You can use other metrics or statistical tests depending on your problem.
Examples
Using F1 score instead of accuracy to detect drift, useful for imbalanced data.
ML Python
from sklearn.metrics import f1_score old_f1 = f1_score(y_true_old, y_pred_old) new_f1 = f1_score(y_true_new, y_pred_new) if new_f1 < old_f1 - 0.05: print('Model drift detected')
Using a statistical test (Kolmogorov-Smirnov) to detect if input data changed.
ML Python
from scipy.stats import ks_2samp # Compare feature distributions stat, p_value = ks_2samp(feature_old, feature_new) if p_value < 0.05: print('Feature distribution drift detected')
Sample Model
This program trains a simple model, then tests it on original and changed data to detect drift by accuracy drop.
ML Python
from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score import numpy as np # Create initial data X, y = make_classification(n_samples=1000, n_features=5, random_state=42) # Split into train and old test data X_train, X_old_test, y_train, y_old_test = train_test_split(X, y, test_size=0.3, random_state=42) # Train model model = LogisticRegression(max_iter=1000) model.fit(X_train, y_train) # Predict on old test data y_pred_old = model.predict(X_old_test) old_accuracy = accuracy_score(y_old_test, y_pred_old) # Simulate new data with drift by changing feature distribution X_new_test = X_old_test + np.random.normal(0.5, 1.0, X_old_test.shape) # Predict on new test data y_pred_new = model.predict(X_new_test) new_accuracy = accuracy_score(y_old_test, y_pred_new) # Set threshold for drift detection threshold = 0.05 # Detect drift if new_accuracy < old_accuracy - threshold: print('Model drift detected') else: print('No model drift detected') # Print accuracies print(f'Old accuracy: {old_accuracy:.3f}') print(f'New accuracy: {new_accuracy:.3f}')
OutputSuccess
Important Notes
Model drift means the model's performance gets worse because data changed.
Detecting drift early helps keep models accurate and trustworthy.
Use simple metrics or statistical tests depending on your data and problem.
Summary
Model drift detection checks if a model's performance drops over time.
It helps decide when to retrain or update the model.
Common methods compare old and new data predictions or feature distributions.