MLOpsdevops~30 mins

Data drift detection basics in MLOps - Mini Project: Build & Apply

Choose your learning style10 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Data drift detection basics

📖 Scenario: You work as a machine learning engineer. Your model uses data from sensors to predict equipment failures. Over time, the data can change, which may reduce model accuracy. This change is called data drift. Detecting data drift early helps keep the model reliable.

🎯 Goal: Build a simple Python script that detects data drift by comparing the distribution of new sensor data with the original training data.

📋 What You'll Learn

Create a dictionary called training_data with sensor readings as keys and their counts as values

Create a dictionary called new_data with sensor readings as keys and their counts as values

Create a variable called drift_threshold set to 0.2 (20%)

Calculate the total counts in training_data and new_data

Use a for loop with variables reading and count to iterate over training_data.items()

Calculate the proportion difference for each reading between training_data and new_data

Detect if any proportion difference exceeds drift_threshold

Print "Data drift detected" if drift is found, otherwise print "No data drift detected"

💡 Why This Matters

🌍 Real World

Detecting data drift helps maintain machine learning model accuracy by alerting engineers when input data changes significantly.

💼 Career

Data scientists and MLOps engineers use data drift detection to monitor models in production and trigger retraining or alerts.

Progress0 / 4 steps

Create the training data dictionary

Create a dictionary called training_data with these exact entries: "temp_high": 50, "temp_normal": 150, "temp_low": 30

MLOps

# Create the training_data dictionary with exact entries
# Your code here

Hint

Use curly braces {} to create a dictionary with keys and values.

Create the new data dictionary and drift threshold

Create a dictionary called new_data with these exact entries: "temp_high": 100, "temp_normal": 90, "temp_low": 40. Then create a variable called drift_threshold and set it to 0.2

MLOps

training_data = {"temp_high": 50, "temp_normal": 150, "temp_low": 30}
# Create new_data dictionary and drift_threshold variable
# Your code here

Hint

Remember to use the exact variable names and values given.

Calculate proportions and detect drift

Calculate the total counts in training_data and new_data using sum(). Then use a for loop with variables reading and count to iterate over training_data.items(). Inside the loop, calculate the proportion of each reading in training_data and new_data. Check if the absolute difference between these proportions is greater than drift_threshold. If yes, set a variable drift_detected to True.

MLOps

training_data = {"temp_high": 50, "temp_normal": 150, "temp_low": 30}
new_data = {"temp_high": 100, "temp_normal": 90, "temp_low": 40}
drift_threshold = 0.2

# Calculate totals and detect drift
# Your code here

Hint

Use new_data.get(reading, 0) to safely get counts from new_data.

Print the drift detection result

Write a print statement that prints "Data drift detected" if drift_detected is True. Otherwise, print "No data drift detected".

MLOps

training_data = {"temp_high": 50, "temp_normal": 150, "temp_low": 30}
new_data = {"temp_high": 100, "temp_normal": 90, "temp_low": 40}
drift_threshold = 0.2

total_training = sum(training_data.values())
total_new = sum(new_data.values())
drift_detected = False
for reading, count in training_data.items():
    prop_training = count / total_training
    prop_new = new_data.get(reading, 0) / total_new
    if abs(prop_training - prop_new) > drift_threshold:
        drift_detected = True

# Print the result of drift detection
# Your code here

Hint

Use an if statement to check drift_detected and print the correct message.

Practice

(1/5)

1. What is the main purpose of data drift detection in machine learning?

easy

A. To check if new data differs significantly from the training data

B. To improve the speed of model training

C. To reduce the size of the training dataset

D. To increase the number of features in the model

Data drift detection basics in MLOps - Mini Project: Build & Apply

Start learning this pattern below

Practice

Solution

Step 1: Understand data drift concept

Step 2: Identify the purpose

Final Answer:

Quick Check:

Solution

Step 1: Identify correct import and function

Step 2: Check function usage

Final Answer:

Quick Check:

Solution

Step 1: Understand the test and data

Step 2: Interpret p-value meaning

Final Answer:

Quick Check:

Solution

Step 1: Identify the error cause

Step 2: Use correct function name

Final Answer:

Quick Check:

Solution

Step 1: Understand monitoring multiple features

Step 2: Use statistical tests and alerts

Final Answer:

Quick Check: