Kubernetes for ML workloads in MLOps - Time & Space Complexity
When running machine learning tasks on Kubernetes, it is important to understand how the time to complete jobs grows as the workload size increases.
We want to know how the system handles more data or more tasks and how that affects execution time.
Analyze the time complexity of the following Kubernetes job submission code for ML workloads.
for job in ml_jobs:
kubectl apply -f job.yaml --record
wait_for_job_completion(job)
This code submits multiple ML jobs to Kubernetes one after another and waits for each to finish before starting the next.
Look at what repeats in this code.
- Primary operation: Submitting and waiting for each ML job to complete.
- How many times: Once for each job in the list.
As the number of ML jobs increases, the total time grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 job submissions and waits |
| 100 | 100 job submissions and waits |
| 1000 | 1000 job submissions and waits |
Pattern observation: Doubling the number of jobs roughly doubles the total time because jobs run one after another.
Time Complexity: O(n)
This means the total time grows linearly with the number of ML jobs submitted.
[X] Wrong: "Submitting jobs one by one is always faster because it avoids overload."
[OK] Correct: Running jobs sequentially means waiting for each to finish before starting the next, which adds up time linearly instead of running jobs in parallel to save time.
Understanding how job submission scales helps you design better ML pipelines on Kubernetes and shows you can think about system efficiency clearly.
"What if we submitted all ML jobs at once without waiting? How would the time complexity change?"