Why pipelines automate the ML workflow in MLOps - Performance Analysis
We want to understand how the time to run an ML pipeline changes as the amount of data or steps grows.
How does automating steps in a pipeline affect the total work done?
Analyze the time complexity of the following ML pipeline code snippet.
for step in pipeline_steps:
data = step.run(data)
This code runs each step in a pipeline one after another, passing data through.
Look at what repeats in this pipeline execution.
- Primary operation: Running each pipeline step once in order.
- How many times: Once per step, sequentially.
As the number of steps increases, the total time grows linearly.
| Input Size (n) | Approx. Operations |
|---|---|
| 5 steps | 5 step runs |
| 10 steps | 10 step runs |
| 20 steps | 20 step runs |
Pattern observation: Doubling steps roughly doubles total work.
Time Complexity: O(n)
This means the total time grows directly with the number of pipeline steps.
[X] Wrong: "Adding more steps won't affect total time much because they run automatically."
[OK] Correct: Even automated steps take time; more steps mean more work done in sequence.
Understanding how pipeline steps add up helps you explain workflow efficiency clearly and shows you grasp automation impact.
"What if some pipeline steps ran in parallel? How would the time complexity change?"