MLOpsdevops~30 mins

Data drift detection in MLOps - Mini Project: Build & Apply

Choose your learning style10 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Data Drift Detection

📖 Scenario: You work as a data engineer in a company that uses machine learning models to predict customer behavior. Over time, the data your model sees can change, which might make the model less accurate. This change is called data drift. Detecting data drift early helps keep the model reliable.

🎯 Goal: Build a simple Python program that detects data drift by comparing the distribution of a feature in new data against the original training data.

📋 What You'll Learn

Create a dictionary called training_data with feature values

Create a dictionary called new_data with feature values

Calculate the mean of the feature in both datasets

Set a threshold called drift_threshold to detect drift

Compare the means and print if data drift is detected or not

💡 Why This Matters

🌍 Real World

Data drift detection helps keep machine learning models accurate by alerting when input data changes significantly.

💼 Career

Data engineers and MLOps specialists use data drift detection to maintain and monitor deployed ML models in production.

Progress0 / 4 steps

Create training data dictionary

Create a dictionary called training_data with the key 'feature1' and the list of values [10, 12, 11, 13, 12].

MLOps

# Create the training_data dictionary with feature1 values
# Your code here

Hint

Use curly braces to create a dictionary and square brackets for the list of numbers.

Create new data dictionary

Create a dictionary called new_data with the key 'feature1' and the list of values [14, 15, 13, 16, 15].

MLOps

training_data = {'feature1': [10, 12, 11, 13, 12]}
# Create the new_data dictionary with feature1 values
# Your code here

Hint

Follow the same format as the training_data dictionary.

Calculate means and set threshold

Calculate the mean of feature1 in training_data and store it in mean_training. Calculate the mean of feature1 in new_data and store it in mean_new. Then, create a variable called drift_threshold and set it to 2.0.

MLOps

training_data = {'feature1': [10, 12, 11, 13, 12]}
new_data = {'feature1': [14, 15, 13, 16, 15]}
# Calculate mean_training, mean_new and set drift_threshold
# Your code here

Hint

Use sum() and len() functions to calculate the mean.

Detect and print data drift

Write an if statement to check if the absolute difference between mean_new and mean_training is greater than drift_threshold. If yes, print "Data drift detected". Otherwise, print "No data drift detected".

MLOps

training_data = {'feature1': [10, 12, 11, 13, 12]}
new_data = {'feature1': [14, 15, 13, 16, 15]}
mean_training = sum(training_data['feature1']) / len(training_data['feature1'])
mean_new = sum(new_data['feature1']) / len(new_data['feature1'])
drift_threshold = 2.0
# Check for data drift and print the result
# Your code here

Hint

Use abs() to get the absolute difference.

Practice

(1/5)

1. What is the main purpose of data drift detection in MLOps?

easy

A. To reduce the size of the dataset

B. To check if new data differs significantly from the training data

C. To improve the speed of model training

D. To increase the number of features in the model

Data drift detection in MLOps - Mini Project: Build & Apply

Start learning this pattern below

Practice

Solution

Step 1: Understand data drift concept

Step 2: Identify the purpose of detection

Final Answer:

Quick Check:

Solution

Step 1: Recall common MLOps tools

Step 2: Differentiate from other libraries

Final Answer:

Quick Check:

Solution

Step 1: Understand Evidently report usage

Step 2: Identify the purpose of the method

Final Answer:

Quick Check:

Solution

Step 1: Check Dashboard.run() method requirements

Step 2: Identify missing argument

Final Answer:

Quick Check:

Solution

Step 1: Understand automation in MLOps

Step 2: Identify best practice

Final Answer:

Quick Check: