0
0
ML Pythonml~5 mins

Model drift detection in ML Python

Choose your learning style9 modes available
Introduction

Model drift detection helps us know when a machine learning model stops working well because things have changed. It keeps the model useful over time.

When a model is used in real life and data changes over time, like weather or sales data.
When you want to keep a recommendation system accurate as user preferences change.
When monitoring fraud detection models because fraud patterns evolve.
When deploying models in production to catch performance drops early.
When retraining models regularly to decide if retraining is needed.
Syntax
ML Python
from sklearn.metrics import accuracy_score

# Compare old and new data predictions
old_accuracy = accuracy_score(y_true_old, y_pred_old)
new_accuracy = accuracy_score(y_true_new, y_pred_new)

# Check if accuracy dropped significantly
if new_accuracy < old_accuracy - threshold:
    print('Model drift detected')

This example uses accuracy to detect drift by comparing old and new data performance.

You can use other metrics or statistical tests depending on your problem.

Examples
Using F1 score instead of accuracy to detect drift, useful for imbalanced data.
ML Python
from sklearn.metrics import f1_score

old_f1 = f1_score(y_true_old, y_pred_old)
new_f1 = f1_score(y_true_new, y_pred_new)

if new_f1 < old_f1 - 0.05:
    print('Model drift detected')
Using a statistical test (Kolmogorov-Smirnov) to detect if input data changed.
ML Python
from scipy.stats import ks_2samp

# Compare feature distributions
stat, p_value = ks_2samp(feature_old, feature_new)

if p_value < 0.05:
    print('Feature distribution drift detected')
Sample Model

This program trains a simple model, then tests it on original and changed data to detect drift by accuracy drop.

ML Python
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import numpy as np

# Create initial data
X, y = make_classification(n_samples=1000, n_features=5, random_state=42)

# Split into train and old test data
X_train, X_old_test, y_train, y_old_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Predict on old test data
y_pred_old = model.predict(X_old_test)
old_accuracy = accuracy_score(y_old_test, y_pred_old)

# Simulate new data with drift by changing feature distribution
X_new_test = X_old_test + np.random.normal(0.5, 1.0, X_old_test.shape)

# Predict on new test data
y_pred_new = model.predict(X_new_test)
new_accuracy = accuracy_score(y_old_test, y_pred_new)

# Set threshold for drift detection
threshold = 0.05

# Detect drift
if new_accuracy < old_accuracy - threshold:
    print('Model drift detected')
else:
    print('No model drift detected')

# Print accuracies
print(f'Old accuracy: {old_accuracy:.3f}')
print(f'New accuracy: {new_accuracy:.3f}')
OutputSuccess
Important Notes

Model drift means the model's performance gets worse because data changed.

Detecting drift early helps keep models accurate and trustworthy.

Use simple metrics or statistical tests depending on your data and problem.

Summary

Model drift detection checks if a model's performance drops over time.

It helps decide when to retrain or update the model.

Common methods compare old and new data predictions or feature distributions.