Bird
Raised Fist0
MLOpsdevops~5 mins

Data drift detection basics in MLOps - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is data drift in machine learning?
Data drift happens when the data used by a machine learning model changes over time, making the model less accurate because it sees different data than it was trained on.
Click to reveal answer
beginner
Why is detecting data drift important?
Detecting data drift helps keep machine learning models accurate by alerting us when the data changes, so we can update or retrain the model.
Click to reveal answer
intermediate
Name one common method to detect data drift.
One common method is to compare statistical properties like mean or distribution of new data against the training data using tests like the Kolmogorov-Smirnov test.
Click to reveal answer
intermediate
What role does a monitoring system play in data drift detection?
A monitoring system automatically checks incoming data for changes and alerts the team if data drift is detected, enabling quick action to maintain model performance.
Click to reveal answer
beginner
How can you respond when data drift is detected?
You can retrain the model with new data, adjust the model, or investigate if the data change is expected or a problem.
Click to reveal answer
What does data drift affect in a machine learning model?
AModel deployment time
BModel training speed
CModel size
DModel accuracy
Which statistical test is commonly used to detect data drift?
AChi-square test for independence
BT-test for means
CKolmogorov-Smirnov test
DANOVA
What should you do first when data drift is detected?
AInvestigate the cause of the drift
BIgnore it if the model still works
CRetrain the model immediately
DDelete the old data
Data drift monitoring is usually done how?
AUsing automated monitoring tools
BManually checking data daily
COnly during model training
DBy checking model code
Which of these is NOT a sign of data drift?
AChange in data distribution
BModel training time decreases
CModel accuracy drops
DNew data has unexpected values
Explain what data drift is and why it matters for machine learning models.
Think about how changing data affects a model's predictions.
You got /3 concepts.
    Describe common ways to detect data drift and how to respond when it happens.
    Consider tools and steps after noticing data changes.
    You got /3 concepts.

      Practice

      (1/5)
      1. What is the main purpose of data drift detection in machine learning?
      easy
      A. To check if new data differs significantly from the training data
      B. To improve the speed of model training
      C. To reduce the size of the training dataset
      D. To increase the number of features in the model

      Solution

      1. Step 1: Understand data drift concept

        Data drift detection is about monitoring if new incoming data changes compared to the data used to train the model.
      2. Step 2: Identify the purpose

        This helps ensure the model stays accurate by alerting when data changes too much.
      3. Final Answer:

        To check if new data differs significantly from the training data -> Option A
      4. Quick Check:

        Data drift = detecting data changes [OK]
      Hint: Data drift means new data differs from old data [OK]
      Common Mistakes:
      • Confusing data drift with model training speed
      • Thinking data drift reduces dataset size
      • Believing data drift adds features
      2. Which of the following is a correct Python code snippet to check data drift using the Kolmogorov-Smirnov test on two datasets data_train and data_new?
      easy
      A. from scipy.stats import ks_test result = ks_test(data_train, data_new) print(result.pvalue)
      B. from scipy.stats import ks_2samp result = ks_2samp(data_train, data_new) print(result.pvalue)
      C. from sklearn.drift import ks_test result = ks_test(data_train, data_new) print(result.pvalue)
      D. import stats result = stats.ks_test(data_train, data_new) print(result.pvalue)

      Solution

      1. Step 1: Identify correct import and function

        The Kolmogorov-Smirnov test is in scipy.stats as ks_2samp.
      2. Step 2: Check function usage

        Calling ks_2samp(data_train, data_new) returns a result with pvalue attribute.
      3. Final Answer:

        from scipy.stats import ks_2samp result = ks_2samp(data_train, data_new) print(result.pvalue) -> Option B
      4. Quick Check:

        Correct function and import = from scipy.stats import ks_2samp result = ks_2samp(data_train, data_new) print(result.pvalue) [OK]
      Hint: Use scipy.stats.ks_2samp for data drift test [OK]
      Common Mistakes:
      • Using wrong module or function name
      • Trying to import non-existent ks_test
      • Confusing sklearn with scipy for this test
      3. Given the following Python code to detect data drift, what will be the output if data_train = [1, 2, 3, 4, 5] and data_new = [1, 2, 3, 4, 10]?
      from scipy.stats import ks_2samp
      result = ks_2samp(data_train, data_new)
      print(round(result.pvalue, 2))
      medium
      A. 0.87
      B. 0.05
      C. 0.01
      D. 1.00

      Solution

      1. Step 1: Understand the test and data

        The Kolmogorov-Smirnov test compares distributions. Here, only one value differs (5 vs 10).
      2. Step 2: Interpret p-value meaning

        A high p-value (close to 1) means no significant difference, low means drift detected.
      3. Final Answer:

        0.87 -> Option A
      4. Quick Check:

        Small difference gives high p-value = 0.87 [OK]
      Hint: Small data changes give high p-value (no drift) [OK]
      Common Mistakes:
      • Assuming any difference means low p-value
      • Confusing p-value with test statistic
      • Rounding errors in output
      4. You wrote this code to detect data drift but get an error: AttributeError: module 'scipy.stats' has no attribute 'ks_test'. What is the fix?
      import scipy.stats as stats
      result = stats.ks_test(data_train, data_new)
      print(result.pvalue)
      medium
      A. Use stats.kstest instead of ks_test
      B. Import ks_test from scipy.stats explicitly
      C. Change ks_test to ks_2samp in the code
      D. Update scipy package to latest version

      Solution

      1. Step 1: Identify the error cause

        The error says ks_test does not exist in scipy.stats.
      2. Step 2: Use correct function name

        The correct function for two-sample KS test is ks_2samp, not ks_test.
      3. Final Answer:

        Change ks_test to ks_2samp in the code -> Option C
      4. Quick Check:

        Function name must be ks_2samp [OK]
      Hint: Use ks_2samp, not ks_test, for two-sample KS test [OK]
      Common Mistakes:
      • Trying to import non-existent ks_test
      • Using one-sample test function by mistake
      • Ignoring error message details
      5. You want to monitor data drift for multiple features in your dataset. Which approach best helps detect drift over time and alert you when it happens?
      hard
      A. Ignore data drift and focus on model accuracy metrics only
      B. Retrain the model daily without checking data changes
      C. Increase the model complexity to handle any data changes automatically
      D. Run a statistical test like KS test on each feature periodically and trigger alerts if p-value is below threshold

      Solution

      1. Step 1: Understand monitoring multiple features

        Checking each feature for drift helps catch changes in data distribution over time.
      2. Step 2: Use statistical tests and alerts

        Applying tests like KS test periodically and alerting on low p-values ensures timely detection.
      3. Final Answer:

        Run a statistical test like KS test on each feature periodically and trigger alerts if p-value is below threshold -> Option D
      4. Quick Check:

        Periodic tests + alerts = best drift monitoring [OK]
      Hint: Test features regularly and alert on low p-values [OK]
      Common Mistakes:
      • Retraining blindly without drift checks
      • Ignoring drift and trusting accuracy alone
      • Assuming complex models fix drift automatically