ML Pythonml~20 mins

Model drift detection in ML Python - ML Experiment: Train & Evaluate

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Experiment - Model drift detection

Problem:You have a classification model trained on old data. Over time, the data changes and the model's predictions become less accurate. This is called model drift. You want to detect when the model drift happens.

Current Metrics:Initial model accuracy on old data: 90%. Accuracy on new data: 70%. No drift detection implemented.

Issue:The model performs well on old data but poorly on new data. There is no system to detect when the model's performance drops due to data changes.

Your Task

Implement a simple model drift detection method that compares the distribution of new incoming data with the training data. The goal is to detect drift before accuracy drops below 80%.

Use only Python and standard libraries like numpy, pandas, sklearn.

Do not retrain the model in this task.

Focus on detecting drift using statistical tests or distance metrics.

Hint 1

Hint 2

Hint 3

Solution

ML Python

import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from scipy.stats import ks_2samp

n_features = 5

# Generate old training data
X_old, y_old = make_classification(n_samples=1000, n_features=n_features, random_state=42)

# Generate new data with drift (shift in feature distribution)
X_new, y_new = make_classification(n_samples=300, n_features=n_features, random_state=24)
X_new += 0.5  # Shift features to simulate drift

# Function to detect drift using KS test
# Returns True if drift detected

def detect_drift(X_ref, X_test, alpha=0.05):
    drift_flags = []
    for i in range(X_ref.shape[1]):
        stat, p_value = ks_2samp(X_ref[:, i], X_test[:, i])
        drift_flags.append(p_value < alpha)
    # If any feature shows drift, flag drift
    return any(drift_flags)

# Detect drift
is_drift = detect_drift(X_old, X_new)
print(f"Drift detected: {is_drift}")

Added a function to compare feature distributions using the Kolmogorov-Smirnov test.

Simulated new data with shifted features to represent drift.

Implemented a simple rule to flag drift if any feature distribution changes significantly.

Results Interpretation

Before: No drift detection, model accuracy dropped from 90% to 70% on new data.

After: Drift detection flagged data change before accuracy dropped below 80%, allowing timely intervention.

Model drift detection helps identify when input data changes enough to affect model performance. Using simple statistical tests on features can alert us early to retrain or update the model.

Bonus Experiment

Try using the Jensen-Shannon divergence to measure drift instead of the KS test.

💡 Hint

Calculate probability distributions of features using histograms and then compute Jensen-Shannon divergence to quantify differences.