ML lifecycle stages in MLOps - Time & Space Complexity
We want to understand how the time needed to complete an ML lifecycle changes as the amount of data or tasks grows.
How does the work increase when we add more data or models?
Analyze the time complexity of the following ML lifecycle stages code snippet.
for dataset in datasets:
preprocess(dataset)
model = train_model(dataset)
evaluate(model, dataset)
deploy(model)
This code runs the main ML lifecycle steps for each dataset in a list.
Look at what repeats as input grows.
- Primary operation: Running the full ML lifecycle (preprocess, train, evaluate, deploy) for each dataset.
- How many times: Once for each dataset in the list.
As the number of datasets increases, the total work grows proportionally.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 full ML lifecycles |
| 100 | 100 full ML lifecycles |
| 1000 | 1000 full ML lifecycles |
Pattern observation: Doubling the datasets doubles the total work.
Time Complexity: O(n)
This means the total time grows directly with the number of datasets processed.
[X] Wrong: "Adding more datasets won't affect total time much because each step is fast."
[OK] Correct: Each dataset requires a full set of steps, so more datasets mean more total work and time.
Understanding how work grows with input size helps you explain and plan ML workflows clearly in real projects.
"What if we parallelize the training for all datasets? How would the time complexity change?"