Bird
Raised Fist0
MLOpsdevops~20 mins

Data drift detection in MLOps - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Data Drift Detection Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
What is the primary goal of data drift detection in MLOps?

Data drift detection is crucial in machine learning operations. What is its main purpose?

ATo automate the deployment of new model versions
BTo improve the training speed of machine learning models
CTo monitor hardware resource usage during model training
DTo identify changes in input data distribution that may affect model performance
Attempts:
2 left
💡 Hint

Think about why monitoring data quality over time matters for a model's accuracy.

💻 Command Output
intermediate
1:30remaining
Output of a data drift detection command

Given the following command output from a data drift tool, what does it indicate?

{"feature": "age", "drift_detected": true, "p_value": 0.01}
AThe feature 'age' has statistically significant drift with p-value 0.01
BThe feature 'age' was removed from the dataset
CThe model accuracy improved due to drift in 'age'
DNo drift detected for feature 'age' because p-value is above 0.05
Attempts:
2 left
💡 Hint

Recall that a p-value below 0.05 usually means significant change.

Configuration
advanced
2:00remaining
Configuring a data drift detection threshold

Which configuration snippet correctly sets a data drift detection threshold to trigger alerts when p-value is below 0.05?

A
drift_detection:
  threshold: '0.05'
  alert: false
B
drift_detection:
  threshold: 0.05
  alert: true
C
drift_detection:
  threshold: 5
  alert: true
D
drift_detection:
  alert: true
  threshold: 0.5
Attempts:
2 left
💡 Hint

Threshold should be a decimal number representing p-value cutoff.

🔀 Workflow
advanced
2:00remaining
Correct sequence for data drift detection workflow

What is the correct order of steps in a typical data drift detection workflow?

A1,2,3,4
B1,3,2,4
C2,1,3,4
D3,1,2,4
Attempts:
2 left
💡 Hint

Think about the natural flow from data collection to action.

Troubleshoot
expert
2:30remaining
Why does the data drift detection tool report no drift despite model accuracy dropping?

A deployed model's accuracy dropped significantly, but the data drift detection tool reports no drift. What is the most likely cause?

AThe model was retrained recently, so drift is ignored
BThe drift detection tool is malfunctioning and needs a restart
CThe drift detection only monitors input features, but the issue is with label distribution shift
DThe data drift detection threshold is set too low, causing false positives
Attempts:
2 left
💡 Hint

Consider what types of drift affect model accuracy but might not be detected by input data checks.

Practice

(1/5)
1. What is the main purpose of data drift detection in MLOps?
easy
A. To reduce the size of the dataset
B. To check if new data differs significantly from the training data
C. To improve the speed of model training
D. To increase the number of features in the model

Solution

  1. Step 1: Understand data drift concept

    Data drift means the new data changes compared to the data used to train the model.
  2. Step 2: Identify the purpose of detection

    Detecting data drift helps decide when to retrain or update the model to keep it accurate.
  3. Final Answer:

    To check if new data differs significantly from the training data -> Option B
  4. Quick Check:

    Data drift detection = check data difference [OK]
Hint: Data drift means new data changes from old data [OK]
Common Mistakes:
  • Confusing data drift with model speed optimization
  • Thinking data drift reduces dataset size
  • Assuming data drift adds features
2. Which Python library is commonly used for detecting data drift in MLOps?
easy
A. Flask
B. NumPy
C. Matplotlib
D. Evidently

Solution

  1. Step 1: Recall common MLOps tools

    Evidently is a popular tool designed specifically for monitoring data and model drift.
  2. Step 2: Differentiate from other libraries

    NumPy is for math, Matplotlib for plotting, Flask for web apps, not for drift detection.
  3. Final Answer:

    Evidently -> Option D
  4. Quick Check:

    Evidently = data drift detection tool [OK]
Hint: Evidently is made for data drift detection [OK]
Common Mistakes:
  • Choosing NumPy or Matplotlib which are not for drift detection
  • Confusing Flask as a data tool
3. Given the code snippet using Evidently, what will report.run(reference_data, current_data) do?
medium
A. Visualize the model architecture
B. Train a new model on current_data
C. Compare current_data with reference_data to detect data drift
D. Delete old data from the system

Solution

  1. Step 1: Understand Evidently report usage

    The run method compares new data (current_data) against reference data to find differences.
  2. Step 2: Identify the purpose of the method

    It does not train models, visualize architecture, or delete data; it detects data drift.
  3. Final Answer:

    Compare current_data with reference_data to detect data drift -> Option C
  4. Quick Check:

    report.run compares data for drift [OK]
Hint: report.run compares new vs reference data [OK]
Common Mistakes:
  • Thinking it trains a model
  • Assuming it visualizes model structure
  • Believing it deletes data
4. You wrote this code to detect data drift but get an error:
from evidently.dashboard import Dashboard
dashboard = Dashboard(tabs=["data_drift"])
dashboard.run(current_data)
What is the likely mistake?
medium
A. Missing reference data argument in dashboard.run()
B. Incorrect import statement for Dashboard
C. Dashboard does not support data_drift tab
D. current_data is not a valid variable name

Solution

  1. Step 1: Check Dashboard.run() method requirements

    Dashboard.run() requires both reference and current datasets to compare for drift.
  2. Step 2: Identify missing argument

    Only current_data is passed; reference_data is missing, causing the error.
  3. Final Answer:

    Missing reference data argument in dashboard.run() -> Option A
  4. Quick Check:

    Dashboard.run needs reference and current data [OK]
Hint: Dashboard.run needs two datasets: reference and current [OK]
Common Mistakes:
  • Assuming import is wrong
  • Thinking data_drift tab is unsupported
  • Believing variable name causes error
5. You want to automate retraining your model when data drift is detected. Which approach best fits this goal?
hard
A. Set up a monitoring pipeline that runs data drift detection daily and triggers retraining if drift is found
B. Retrain the model every week regardless of data changes
C. Manually check data drift reports and retrain when you have time
D. Ignore data drift and only retrain when model accuracy drops

Solution

  1. Step 1: Understand automation in MLOps

    Automating retraining based on data drift ensures the model stays accurate without manual checks.
  2. Step 2: Identify best practice

    Running daily drift detection and triggering retraining only when drift occurs is efficient and effective.
  3. Final Answer:

    Set up a monitoring pipeline that runs data drift detection daily and triggers retraining if drift is found -> Option A
  4. Quick Check:

    Automate retrain on drift detection = best practice [OK]
Hint: Automate retrain triggered by drift detection [OK]
Common Mistakes:
  • Retraining blindly without checking data
  • Relying on manual checks only
  • Ignoring drift until accuracy drops