Pipeline components and DAGs in MLOps - Time & Space Complexity
When working with pipelines and DAGs, it is important to know how the time to run tasks grows as the pipeline gets bigger.
We want to understand how the number of tasks affects the total execution time.
Analyze the time complexity of the following pipeline execution code.
for task in dag.tasks:
if all(dep.is_complete() for dep in task.dependencies):
task.run()
This code runs each task in a DAG only after its dependencies are complete.
Look at what repeats as the pipeline runs.
- Primary operation: Checking dependencies for each task.
- How many times: Once per task, and for each dependency of that task.
As the number of tasks grows, the checks increase based on how many dependencies each task has.
| Input Size (n tasks) | Approx. Operations |
|---|---|
| 10 | About 30 checks if each task has 3 dependencies |
| 100 | About 300 checks |
| 1000 | About 3000 checks |
Pattern observation: The total checks grow roughly in proportion to the number of tasks times their dependencies.
Time Complexity: O(n * d)
This means the time grows with the number of tasks multiplied by the average number of dependencies per task.
[X] Wrong: "The time to run the pipeline grows only with the number of tasks, ignoring dependencies."
[OK] Correct: Each task must check all its dependencies, so dependencies add to the total work.
Understanding how pipeline execution time grows helps you design efficient workflows and explain your reasoning clearly in interviews.
"What if tasks could run in parallel without waiting for dependencies? How would the time complexity change?"