Evidently AI for monitoring in MLOps - Time & Space Complexity
When using Evidently AI for monitoring machine learning models, it's important to understand how the time to process data grows as the amount of data increases.
We want to know how the monitoring workload changes when we feed more data for analysis.
Analyze the time complexity of the following Evidently AI monitoring code snippet.
from evidently import Report
from evidently.metric_preset import DataDriftPreset
report = Report(metrics=[DataDriftPreset()])
report.run(reference_data, current_data)
report.save_html('monitoring_report.html')
This code creates a dashboard to detect data drift by comparing reference and current datasets, then generates a report.
Look at what repeats when the dashboard calculates data drift.
- Primary operation: Comparing each feature's distribution between reference and current datasets.
- How many times: Once per feature, and for each data point in the datasets.
As the number of data points grows, the time to compare distributions grows roughly in proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | Small number of comparisons per feature |
| 100 | About 10 times more comparisons |
| 1000 | About 100 times more comparisons |
Pattern observation: The workload grows linearly with the number of data points and features.
Time Complexity: O(n * f)
This means the time grows proportionally with the number of data points (n) and the number of features (f) being monitored.
[X] Wrong: "The monitoring time stays the same no matter how much data we have."
[OK] Correct: More data means more comparisons to detect drift, so the time increases with data size.
Understanding how monitoring scales with data size helps you design efficient ML pipelines and shows you can think about real-world system performance.
"What if we added more complex drift checks that compare pairs of features? How would the time complexity change?"