Technical debt in ML systems in MLOps - Time & Space Complexity
When working with machine learning systems, technical debt can slow down how fast the system runs as it grows.
We want to understand how the cost of running or updating ML systems changes as they get bigger or more complex.
Analyze the time complexity of the following ML pipeline update process.
for model in deployed_models:
for data_batch in new_data:
preprocess(data_batch)
predictions = model.predict(data_batch)
evaluate(predictions)
retrain(model, full_training_data)
This code updates each deployed model by processing new data batches, making predictions, evaluating them, and then retraining the model.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Nested loops over models and data batches.
- How many times: For each model, it processes every data batch, then retrains once per model.
As the number of models or data batches grows, the total work grows by multiplying these counts.
| Input Size (models x data batches) | Approx. Operations |
|---|---|
| 10 models x 10 batches | ~100 operations plus 10 retrains |
| 100 models x 100 batches | ~10,000 operations plus 100 retrains |
| 1000 models x 1000 batches | ~1,000,000 operations plus 1000 retrains |
Pattern observation: The work grows quickly as both models and data batches increase, multiplying the total steps.
Time Complexity: O(m * n)
This means the time to update grows roughly by multiplying the number of models (m) and data batches (n).
[X] Wrong: "The time grows only with the number of models, not data batches."
[OK] Correct: Each model processes all data batches, so both counts multiply the total work.
Understanding how technical debt affects ML system updates helps you explain real challenges in keeping models fresh and efficient.
"What if retraining was done only once for all models together? How would the time complexity change?"