GPU support in containers in MLOps - Time & Space Complexity
We want to understand how the time to start and run GPU-enabled containers changes as we add more GPUs or containers.
How does the setup and execution time grow when using GPUs inside containers?
Analyze the time complexity of the following container startup code with GPU support.
for gpu_id in range(num_gpus):
container = create_container(
image='ml-gpu-image',
runtime='nvidia',
device=f'/dev/nvidia{gpu_id}'
)
container.start()
This code creates and starts one container per GPU, assigning each GPU device to its container.
Look at what repeats as input grows.
- Primary operation: Loop creating and starting containers for each GPU.
- How many times: Once per GPU, so the number of GPUs controls the repeats.
Starting containers grows with the number of GPUs.
| Input Size (n) | Approx. Operations |
|---|---|
| 1 GPU | 1 container start |
| 4 GPUs | 4 container starts |
| 10 GPUs | 10 container starts |
Pattern observation: The time grows directly with the number of GPUs because each GPU needs its own container started.
Time Complexity: O(n)
This means the time to start GPU containers grows linearly with the number of GPUs used.
[X] Wrong: "Starting multiple GPU containers happens all at once, so time stays the same no matter how many GPUs."
[OK] Correct: Each container start takes time and happens one after another in this code, so total time adds up with more GPUs.
Understanding how resource setup time grows helps you design scalable machine learning pipelines and container orchestration strategies.
What if we started all GPU containers in parallel instead of a loop? How would the time complexity change?