0
0
MLOpsdevops~30 mins

Data drift detection in MLOps - Mini Project: Build & Apply

Choose your learning style9 modes available
Data Drift Detection
📖 Scenario: You work as a data engineer in a company that uses machine learning models to predict customer behavior. Over time, the data your model sees can change, which might make the model less accurate. This change is called data drift. Detecting data drift early helps keep the model reliable.
🎯 Goal: Build a simple Python program that detects data drift by comparing the distribution of a feature in new data against the original training data.
📋 What You'll Learn
Create a dictionary called training_data with feature values
Create a dictionary called new_data with feature values
Calculate the mean of the feature in both datasets
Set a threshold called drift_threshold to detect drift
Compare the means and print if data drift is detected or not
💡 Why This Matters
🌍 Real World
Data drift detection helps keep machine learning models accurate by alerting when input data changes significantly.
💼 Career
Data engineers and MLOps specialists use data drift detection to maintain and monitor deployed ML models in production.
Progress0 / 4 steps
1
Create training data dictionary
Create a dictionary called training_data with the key 'feature1' and the list of values [10, 12, 11, 13, 12].
MLOps
Need a hint?

Use curly braces to create a dictionary and square brackets for the list of numbers.

2
Create new data dictionary
Create a dictionary called new_data with the key 'feature1' and the list of values [14, 15, 13, 16, 15].
MLOps
Need a hint?

Follow the same format as the training_data dictionary.

3
Calculate means and set threshold
Calculate the mean of feature1 in training_data and store it in mean_training. Calculate the mean of feature1 in new_data and store it in mean_new. Then, create a variable called drift_threshold and set it to 2.0.
MLOps
Need a hint?

Use sum() and len() functions to calculate the mean.

4
Detect and print data drift
Write an if statement to check if the absolute difference between mean_new and mean_training is greater than drift_threshold. If yes, print "Data drift detected". Otherwise, print "No data drift detected".
MLOps
Need a hint?

Use abs() to get the absolute difference.