MLOpsdevops~10 mins

Data drift detection basics in MLOps - Step-by-Step Execution

Choose your learning style10 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Process Flow - Data drift detection basics

Collect baseline data

↓

Train model on baseline

↓

Collect new incoming data

↓

Compare new data to baseline

↓

Calculate drift metrics

↓

Is drift above threshold?

No→Continue monitoring

Yes↓

Trigger alert or retrain model

This flow shows how data drift detection compares new data to baseline data, calculates metrics, and triggers alerts if drift is detected.

Execution Sample

MLOps

baseline_data = [10, 12, 11, 13, 12]
new_data = [10, 15, 11, 14, 20]
differences = [abs(n - b) for n, b in zip(new_data, baseline_data)]
drift = sum(differences) / len(baseline_data)
threshold = 3
alert = drift > threshold
print(alert)

This code calculates a simple average absolute difference between baseline and new data to detect drift and prints if alert is triggered.

Process Table

Step	Action	Calculation	Value	Result
1	Calculate absolute differences	abs(10-10), abs(15-12), abs(11-11), abs(14-13), abs(20-12)	[0, 3, 0, 1, 8]	List of differences
2	Sum differences	0 + 3 + 0 + 1 + 8	12	Total difference
3	Calculate average difference	12 / 5	2.4	Drift metric
4	Compare drift to threshold	2.4 > 3	False	No alert triggered

💡 Drift 2.4 is less than threshold 3, so no alert is triggered.

Status Tracker

Variable	Start	After Step 1	After Step 2	After Step 3	After Step 4
baseline_data	[10,12,11,13,12]	[10,12,11,13,12]	[10,12,11,13,12]	[10,12,11,13,12]	[10,12,11,13,12]
new_data	[10,15,11,14,20]	[10,15,11,14,20]	[10,15,11,14,20]	[10,15,11,14,20]	[10,15,11,14,20]
differences	N/A	[0,3,0,1,8]	[0,3,0,1,8]	[0,3,0,1,8]	[0,3,0,1,8]
total_difference	N/A	N/A	12	12	12
drift	N/A	N/A	N/A	2.4	2.4
threshold	3	3	3	3	3
alert	N/A	N/A	N/A	N/A	False

Key Moments - 3 Insights

Why do we calculate the average difference instead of just the sum?

What does it mean if the alert is False even though differences exist?

Why do we compare new data to baseline data?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table at step 3, what is the drift value calculated?

B12

C2.4

DFalse

Concept Snapshot

Data drift detection compares new data to baseline data.
Calculate a drift metric (e.g., average absolute difference).
Set a threshold to decide if drift is significant.
If drift > threshold, trigger alert or retrain.
This helps keep ML models accurate over time.

Full Transcript

Data drift detection basics involve comparing new incoming data to the original baseline data used to train a model. We calculate a drift metric, such as the average absolute difference between the new and baseline data points. This metric is then compared to a set threshold. If the drift exceeds the threshold, it indicates that the data distribution has changed significantly, and an alert is triggered to notify that the model may need retraining. This process helps maintain model accuracy by detecting when the data environment changes.

Practice

(1/5)

1. What is the main purpose of data drift detection in machine learning?

easy

A. To check if new data differs significantly from the training data

B. To improve the speed of model training

C. To reduce the size of the training dataset

D. To increase the number of features in the model

Data drift detection basics in MLOps - Step-by-Step Execution

Start learning this pattern below

Practice

Solution

Step 1: Understand data drift concept

Step 2: Identify the purpose

Final Answer:

Quick Check:

Solution

Step 1: Identify correct import and function

Step 2: Check function usage

Final Answer:

Quick Check:

Solution

Step 1: Understand the test and data

Step 2: Interpret p-value meaning

Final Answer:

Quick Check:

Solution

Step 1: Identify the error cause

Step 2: Use correct function name

Final Answer:

Quick Check:

Solution

Step 1: Understand monitoring multiple features

Step 2: Use statistical tests and alerts

Final Answer:

Quick Check: