Concept drift detection in MLOps - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When detecting concept drift, we want to know how the time to check for changes grows as data increases.
We ask: How does the detection process scale with more incoming data?
Analyze the time complexity of the following code snippet.
# Sliding window concept drift detection
window_size = 100
for i in range(len(data) - window_size + 1):
window = data[i:i+window_size]
drift_score = calculate_drift(window)
if drift_score > threshold:
alert_drift(i)
This code slides a fixed-size window over the data to check for drift at each step.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Loop sliding the window over data and calculating drift each time.
- How many times: Once for each position in data minus window size plus one.
As data size grows, the number of windows checked grows roughly the same amount.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | ~(10 - 100 + 1) = 0 (no windows) |
| 100 | ~(100 - 100 + 1) = 1 (one window) |
| 1000 | ~(1000 - 100 + 1) = 901 windows checked |
Pattern observation: Operations grow linearly with data size once data is larger than window.
Time Complexity: O(n)
This means the time to detect drift grows directly with the amount of data.
[X] Wrong: "The detection time stays the same no matter how much data we have."
[OK] Correct: Each new data point adds a new window to check, so time grows with data size.
Understanding how detection time scales helps you design systems that handle growing data smoothly.
"What if we increased the window size as data grows? How would the time complexity change?"
Practice
Solution
Step 1: Understand concept drift meaning
Concept drift means the data changes over time, causing model accuracy to drop.Step 2: Identify the purpose of detection
Detecting drift helps know when the model needs updating to keep accuracy high.Final Answer:
To identify when the data distribution changes over time affecting model accuracy -> Option AQuick Check:
Concept drift detection = find data changes [OK]
- Confusing drift detection with speeding up training
- Thinking drift reduces dataset size
- Assuming drift improves hardware
Solution
Step 1: Identify drift detection methods
Drift detection compares model performance on new data to old data to find changes.Step 2: Evaluate options
Only comparing accuracy over time relates to drift detection; others affect training but not drift.Final Answer:
Compare model accuracy on recent data versus older data -> Option DQuick Check:
Drift detection = compare old vs new accuracy [OK]
- Confusing training hyperparameters with drift detection
- Thinking batch size or learning rate detect drift
- Ignoring performance comparison over time
old_accuracy = 0.85
new_accuracy = 0.70
threshold = 0.1
if old_accuracy - new_accuracy > threshold:
print('Drift detected')
else:
print('No drift')What will be the output?
Solution
Step 1: Calculate accuracy difference
old_accuracy - new_accuracy = 0.85 - 0.70 = 0.15Step 2: Compare difference to threshold
0.15 > 0.1, so condition is true and 'Drift detected' prints.Final Answer:
Drift detected -> Option CQuick Check:
0.15 > 0.1 means drift detected [OK]
- Mixing up greater than and less than signs
- Ignoring the threshold value
- Assuming syntax error due to > symbol
old_acc = 0.9
new_acc = 0.85
threshold = 0.05
if new_acc - old_acc > threshold:
print('Drift detected')
else:
print('No drift')What is the error?
Solution
Step 1: Understand drift detection logic
Drift means accuracy drops, so we check if old accuracy minus new accuracy exceeds threshold.Step 2: Analyze the condition
The code checks if new_acc - old_acc > threshold, which is negative here (0.85 - 0.9 = -0.05), so it won't detect drift correctly.Final Answer:
The condition should be 'old_acc - new_acc > threshold' to detect accuracy drop -> Option BQuick Check:
Check accuracy drop as old - new > threshold [OK]
- Subtracting in wrong order
- Assuming threshold value causes error
- Thinking print statements cause problem
Solution
Step 1: Understand concept drift detection methods
Detecting drift by monitoring data distribution changes helps catch shifts before accuracy drops.Step 2: Evaluate options for best practice
Monitor statistical differences in feature distributions between training and recent data uses statistical tests on features, which is a proactive and effective drift detection method. Other options either ignore data changes or waste resources.Final Answer:
Monitor statistical differences in feature distributions between training and recent data -> Option AQuick Check:
Data distribution monitoring = best drift detection [OK]
- Retraining blindly without drift detection
- Ignoring data changes and only watching accuracy
- Assuming bigger models fix drift automatically
