0
0
MLOpsdevops~10 mins

Data drift detection in MLOps - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Data drift detection
Collect baseline data
Train model on baseline
Collect new incoming data
Compare new data to baseline
Calculate drift metrics
Is drift above threshold?
NoContinue monitoring
Yes
Trigger alert or retrain model
This flow shows how data drift detection compares new data to baseline data, calculates drift, and triggers alerts if drift is significant.
Execution Sample
MLOps
baseline = load_data('baseline.csv')
new_data = load_data('new.csv')
drift_score = calculate_drift(baseline, new_data)
if drift_score > 0.3:
    alert('Data drift detected')
This code loads baseline and new data, calculates a drift score, and alerts if the score exceeds 0.3.
Process Table
StepActionData ComparedDrift ScoreDecisionOutput
1Load baseline databaseline.csv-N/ABaseline data loaded
2Load new datanew.csv-N/ANew data loaded
3Calculate drift scorebaseline vs new0.250.25 <= 0.3No alert
4Calculate drift scorebaseline vs new0.350.35 > 0.3Alert triggered: Data drift detected
💡 Execution stops after alert triggered or no alert if drift score is below threshold
Status Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4
baselineNoneLoaded baseline.csv dataLoaded baseline.csv dataLoaded baseline.csv dataLoaded baseline.csv data
new_dataNoneNoneLoaded new.csv dataLoaded new.csv dataLoaded new.csv data
drift_scoreNoneNoneNone0.25 or 0.35 depending on data0.25 or 0.35 depending on data
Key Moments - 2 Insights
Why do we compare new data to baseline data instead of just looking at new data alone?
Because drift means the new data distribution changes compared to the baseline. The execution_table shows step 3 where the comparison happens to calculate drift_score.
What does the drift_score represent and why is there a threshold?
Drift_score quantifies how much new data differs from baseline. The threshold (0.3) in step 4 decides if the difference is significant enough to alert.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the drift_score at step 3?
A0.3
B0.25
C0.35
DNo score calculated
💡 Hint
Check the 'Drift Score' column in row with Step 3 in execution_table
At which step does the system decide to trigger an alert?
AStep 2
BStep 3
CStep 4
DNo alert triggered
💡 Hint
Look at the 'Decision' and 'Output' columns in execution_table rows
If the threshold was changed to 0.2, how would the alert behavior change?
AAlert triggers at drift_score 0.25
BAlert triggers only at drift_score 0.35
CNo alerts would trigger
DAlerts trigger at any drift_score
💡 Hint
Compare threshold logic in code and execution_table decisions
Concept Snapshot
Data drift detection compares new data to baseline data.
Calculate a drift score to measure difference.
If drift score > threshold, trigger alert or retrain.
Common threshold example: 0.3.
Helps keep ML models accurate over time.
Full Transcript
Data drift detection is a process where we first collect baseline data and train a model on it. Then, as new data comes in, we compare it to the baseline data to see if the data distribution has changed. We calculate a drift score to quantify this difference. If the drift score is above a set threshold, for example 0.3, we trigger an alert to notify that data drift has occurred. This helps us know when to retrain or adjust our machine learning models to keep them accurate. The execution steps include loading baseline and new data, calculating the drift score, and deciding whether to alert based on the threshold.