Bird
Raised Fist0
MLOpsdevops~20 mins

Data drift detection basics in MLOps - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Data Drift Detection Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Understanding Data Drift
What does data drift mean in the context of machine learning models?
AA change in the input data distribution over time that can affect model performance
BA sudden failure of the model due to hardware issues
CAn increase in the size of the training dataset
DA change in the model's architecture during training
Attempts:
2 left
💡 Hint
Think about how the data the model sees might change after deployment.
💻 Command Output
intermediate
2:00remaining
Detecting Data Drift with a Python Library
Given the following Python code snippet using the alibi-detect library to detect data drift, what will be the output if the new data distribution is significantly different from the reference data?
MLOps
from alibi_detect.cd import KSDrift
import numpy as np

# Reference data: normal distribution
ref_data = np.random.normal(0, 1, 1000)

# New data: shifted mean
new_data = np.random.normal(2, 1, 1000)

cd = KSDrift(ref_data)
drift_result = cd.predict(new_data)
print(drift_result['data']['is_drift'])
ANone
BFalse
CRaises TypeError
DTrue
Attempts:
2 left
💡 Hint
KSDrift uses the Kolmogorov-Smirnov test to detect if distributions differ.
Configuration
advanced
2:30remaining
Configuring a Data Drift Monitoring Pipeline
Which of the following YAML configurations correctly sets up a data drift monitoring job that runs daily and alerts if drift is detected?
A
schedule: hourly
monitor:
  type: data_drift
  alert: false
  threshold: 0.5
B
schedule: daily
monitor:
  type: data_drift
  alert: true
  threshold: 0.05
C
schedule: daily
monitor:
  type: data_drift
  alert: yes
  threshold: 0.01
D
schedule: daily
monitor:
  type: drift_detection
  alert: true
  threshold: 0.05
Attempts:
2 left
💡 Hint
Check for correct keys and valid values for alert and threshold.
Troubleshoot
advanced
2:00remaining
Troubleshooting False Negatives in Data Drift Detection
A data drift detection system is not alerting even though the input data has clearly changed. Which of the following is the most likely cause?
AThe monitoring job is running too frequently
BThe model was retrained recently
CThe drift detection threshold is set too high, making it insensitive to changes
DThe input data format changed but the detection system ignores format
Attempts:
2 left
💡 Hint
Consider how sensitivity settings affect detection.
🔀 Workflow
expert
2:30remaining
Order of Steps in Data Drift Detection Workflow
Arrange the following steps in the correct order for a typical data drift detection workflow.
A1,2,3,4
B1,3,2,4
C2,1,3,4
D3,1,2,4
Attempts:
2 left
💡 Hint
Think about the logical flow from data collection to alerting.

Practice

(1/5)
1. What is the main purpose of data drift detection in machine learning?
easy
A. To check if new data differs significantly from the training data
B. To improve the speed of model training
C. To reduce the size of the training dataset
D. To increase the number of features in the model

Solution

  1. Step 1: Understand data drift concept

    Data drift detection is about monitoring if new incoming data changes compared to the data used to train the model.
  2. Step 2: Identify the purpose

    This helps ensure the model stays accurate by alerting when data changes too much.
  3. Final Answer:

    To check if new data differs significantly from the training data -> Option A
  4. Quick Check:

    Data drift = detecting data changes [OK]
Hint: Data drift means new data differs from old data [OK]
Common Mistakes:
  • Confusing data drift with model training speed
  • Thinking data drift reduces dataset size
  • Believing data drift adds features
2. Which of the following is a correct Python code snippet to check data drift using the Kolmogorov-Smirnov test on two datasets data_train and data_new?
easy
A. from scipy.stats import ks_test result = ks_test(data_train, data_new) print(result.pvalue)
B. from scipy.stats import ks_2samp result = ks_2samp(data_train, data_new) print(result.pvalue)
C. from sklearn.drift import ks_test result = ks_test(data_train, data_new) print(result.pvalue)
D. import stats result = stats.ks_test(data_train, data_new) print(result.pvalue)

Solution

  1. Step 1: Identify correct import and function

    The Kolmogorov-Smirnov test is in scipy.stats as ks_2samp.
  2. Step 2: Check function usage

    Calling ks_2samp(data_train, data_new) returns a result with pvalue attribute.
  3. Final Answer:

    from scipy.stats import ks_2samp result = ks_2samp(data_train, data_new) print(result.pvalue) -> Option B
  4. Quick Check:

    Correct function and import = from scipy.stats import ks_2samp result = ks_2samp(data_train, data_new) print(result.pvalue) [OK]
Hint: Use scipy.stats.ks_2samp for data drift test [OK]
Common Mistakes:
  • Using wrong module or function name
  • Trying to import non-existent ks_test
  • Confusing sklearn with scipy for this test
3. Given the following Python code to detect data drift, what will be the output if data_train = [1, 2, 3, 4, 5] and data_new = [1, 2, 3, 4, 10]?
from scipy.stats import ks_2samp
result = ks_2samp(data_train, data_new)
print(round(result.pvalue, 2))
medium
A. 0.87
B. 0.05
C. 0.01
D. 1.00

Solution

  1. Step 1: Understand the test and data

    The Kolmogorov-Smirnov test compares distributions. Here, only one value differs (5 vs 10).
  2. Step 2: Interpret p-value meaning

    A high p-value (close to 1) means no significant difference, low means drift detected.
  3. Final Answer:

    0.87 -> Option A
  4. Quick Check:

    Small difference gives high p-value = 0.87 [OK]
Hint: Small data changes give high p-value (no drift) [OK]
Common Mistakes:
  • Assuming any difference means low p-value
  • Confusing p-value with test statistic
  • Rounding errors in output
4. You wrote this code to detect data drift but get an error: AttributeError: module 'scipy.stats' has no attribute 'ks_test'. What is the fix?
import scipy.stats as stats
result = stats.ks_test(data_train, data_new)
print(result.pvalue)
medium
A. Use stats.kstest instead of ks_test
B. Import ks_test from scipy.stats explicitly
C. Change ks_test to ks_2samp in the code
D. Update scipy package to latest version

Solution

  1. Step 1: Identify the error cause

    The error says ks_test does not exist in scipy.stats.
  2. Step 2: Use correct function name

    The correct function for two-sample KS test is ks_2samp, not ks_test.
  3. Final Answer:

    Change ks_test to ks_2samp in the code -> Option C
  4. Quick Check:

    Function name must be ks_2samp [OK]
Hint: Use ks_2samp, not ks_test, for two-sample KS test [OK]
Common Mistakes:
  • Trying to import non-existent ks_test
  • Using one-sample test function by mistake
  • Ignoring error message details
5. You want to monitor data drift for multiple features in your dataset. Which approach best helps detect drift over time and alert you when it happens?
hard
A. Ignore data drift and focus on model accuracy metrics only
B. Retrain the model daily without checking data changes
C. Increase the model complexity to handle any data changes automatically
D. Run a statistical test like KS test on each feature periodically and trigger alerts if p-value is below threshold

Solution

  1. Step 1: Understand monitoring multiple features

    Checking each feature for drift helps catch changes in data distribution over time.
  2. Step 2: Use statistical tests and alerts

    Applying tests like KS test periodically and alerting on low p-values ensures timely detection.
  3. Final Answer:

    Run a statistical test like KS test on each feature periodically and trigger alerts if p-value is below threshold -> Option D
  4. Quick Check:

    Periodic tests + alerts = best drift monitoring [OK]
Hint: Test features regularly and alert on low p-values [OK]
Common Mistakes:
  • Retraining blindly without drift checks
  • Ignoring drift and trusting accuracy alone
  • Assuming complex models fix drift automatically