Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is data drift in machine learning?
Data drift happens when the data your model sees changes over time compared to the data it was trained on. This can make the model less accurate.
Click to reveal answer
beginner
Name one common method to detect data drift.
One common method is to compare statistical properties like mean or distribution of new data with the training data using tests like the Kolmogorov-Smirnov test.
Click to reveal answer
beginner
Why is data drift detection important in production ML systems?
Detecting data drift helps keep models accurate by alerting when data changes. This allows teams to retrain or update models before performance drops.
Click to reveal answer
beginner
What role does baseline data play in data drift detection?
Baseline data is the original data used to train the model. It serves as a reference to compare new incoming data to find any drift.
Click to reveal answer
intermediate
Give an example of a tool or library used for data drift detection.
Tools like WhyLabs, Alibi Detect, or TensorFlow Data Validation help monitor and detect data drift automatically.
Click to reveal answer
What does data drift affect in a machine learning model?
AModel size
BModel accuracy
CModel training speed
DModel architecture
✗ Incorrect
Data drift changes the input data distribution, which can reduce model accuracy.
Which statistical test is commonly used to detect data drift?
AANOVA
BT-test
CKolmogorov-Smirnov test
DChi-square test
✗ Incorrect
The Kolmogorov-Smirnov test compares distributions to detect drift.
What is the first step in data drift detection?
ACollect baseline data
BRetrain the model
CDeploy the model
DDelete old data
✗ Incorrect
Baseline data is needed to compare new data and detect drift.
Which of these is NOT a sign of data drift?
AIncrease in model training time
BSudden drop in model accuracy
CChange in data distribution
DNew feature values outside training range
✗ Incorrect
Training time increase is unrelated to data drift detection.
What action should you take after detecting data drift?
AChange the model architecture
BIgnore it
CDelete the model
DRetrain or update the model
✗ Incorrect
Retraining helps the model adapt to new data patterns.
Explain what data drift is and why it matters in machine learning.
Think about how changing data affects predictions.
You got /3 concepts.
Describe a simple approach to detect data drift using statistical methods.
Focus on comparing old and new data.
You got /3 concepts.
Practice
(1/5)
1. What is the main purpose of data drift detection in MLOps?
easy
A. To reduce the size of the dataset
B. To check if new data differs significantly from the training data
C. To improve the speed of model training
D. To increase the number of features in the model
Solution
Step 1: Understand data drift concept
Data drift means the new data changes compared to the data used to train the model.
Step 2: Identify the purpose of detection
Detecting data drift helps decide when to retrain or update the model to keep it accurate.
Final Answer:
To check if new data differs significantly from the training data -> Option B
Quick Check:
Data drift detection = check data difference [OK]
Hint: Data drift means new data changes from old data [OK]
Common Mistakes:
Confusing data drift with model speed optimization
Thinking data drift reduces dataset size
Assuming data drift adds features
2. Which Python library is commonly used for detecting data drift in MLOps?
easy
A. Flask
B. NumPy
C. Matplotlib
D. Evidently
Solution
Step 1: Recall common MLOps tools
Evidently is a popular tool designed specifically for monitoring data and model drift.
Step 2: Differentiate from other libraries
NumPy is for math, Matplotlib for plotting, Flask for web apps, not for drift detection.
Final Answer:
Evidently -> Option D
Quick Check:
Evidently = data drift detection tool [OK]
Hint: Evidently is made for data drift detection [OK]
Common Mistakes:
Choosing NumPy or Matplotlib which are not for drift detection
Confusing Flask as a data tool
3. Given the code snippet using Evidently, what will report.run(reference_data, current_data) do?
medium
A. Visualize the model architecture
B. Train a new model on current_data
C. Compare current_data with reference_data to detect data drift
D. Delete old data from the system
Solution
Step 1: Understand Evidently report usage
The run method compares new data (current_data) against reference data to find differences.
Step 2: Identify the purpose of the method
It does not train models, visualize architecture, or delete data; it detects data drift.
Final Answer:
Compare current_data with reference_data to detect data drift -> Option C
Quick Check:
report.run compares data for drift [OK]
Hint: report.run compares new vs reference data [OK]
Common Mistakes:
Thinking it trains a model
Assuming it visualizes model structure
Believing it deletes data
4. You wrote this code to detect data drift but get an error:
from evidently.dashboard import Dashboard
dashboard = Dashboard(tabs=["data_drift"])
dashboard.run(current_data)
What is the likely mistake?
medium
A. Missing reference data argument in dashboard.run()
B. Incorrect import statement for Dashboard
C. Dashboard does not support data_drift tab
D. current_data is not a valid variable name
Solution
Step 1: Check Dashboard.run() method requirements
Dashboard.run() requires both reference and current datasets to compare for drift.
Step 2: Identify missing argument
Only current_data is passed; reference_data is missing, causing the error.
Final Answer:
Missing reference data argument in dashboard.run() -> Option A
Quick Check:
Dashboard.run needs reference and current data [OK]
Hint: Dashboard.run needs two datasets: reference and current [OK]
Common Mistakes:
Assuming import is wrong
Thinking data_drift tab is unsupported
Believing variable name causes error
5. You want to automate retraining your model when data drift is detected. Which approach best fits this goal?
hard
A. Set up a monitoring pipeline that runs data drift detection daily and triggers retraining if drift is found
B. Retrain the model every week regardless of data changes
C. Manually check data drift reports and retrain when you have time
D. Ignore data drift and only retrain when model accuracy drops
Solution
Step 1: Understand automation in MLOps
Automating retraining based on data drift ensures the model stays accurate without manual checks.
Step 2: Identify best practice
Running daily drift detection and triggering retraining only when drift occurs is efficient and effective.
Final Answer:
Set up a monitoring pipeline that runs data drift detection daily and triggers retraining if drift is found -> Option A
Quick Check:
Automate retrain on drift detection = best practice [OK]
Hint: Automate retrain triggered by drift detection [OK]