Pipeline versioning and reproducibility in MLOps - Time & Space Complexity
When working with machine learning pipelines, it is important to understand how the time to run a pipeline changes as the pipeline grows or changes versions.
We want to know how the execution time scales when we add more steps or data to the pipeline.
Analyze the time complexity of the following pipeline execution code.
for step in pipeline.steps:
data = step.run(data)
save_version(step.name, data)
This code runs each step in a pipeline sequentially, passing data along and saving the output version for reproducibility.
Look at what repeats as the pipeline runs.
- Primary operation: Running each pipeline step one after another.
- How many times: Once for each step in the pipeline.
As the number of steps increases, the total time grows roughly in direct proportion.
| Input Size (steps) | Approx. Operations |
|---|---|
| 10 | 10 step runs + 10 saves |
| 100 | 100 step runs + 100 saves |
| 1000 | 1000 step runs + 1000 saves |
Pattern observation: Doubling the number of steps roughly doubles the total execution time.
Time Complexity: O(n)
This means the total time grows linearly with the number of pipeline steps.
[X] Wrong: "Adding more pipeline steps won't affect total runtime much because each step is small."
[OK] Correct: Even small steps add up, so more steps mean more total time, growing linearly.
Understanding how pipeline execution time grows helps you design efficient workflows and explain trade-offs clearly in real projects.
"What if we parallelize some pipeline steps? How would the time complexity change?"