GPU support in containers in MLOps - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
We want to understand how the time to start and run GPU-enabled containers changes as we add more GPUs or containers.
How does the setup and execution time grow when using GPUs inside containers?
Analyze the time complexity of the following container startup code with GPU support.
for gpu_id in range(num_gpus):
container = create_container(
image='ml-gpu-image',
runtime='nvidia',
device=f'/dev/nvidia{gpu_id}'
)
container.start()
This code creates and starts one container per GPU, assigning each GPU device to its container.
Look at what repeats as input grows.
- Primary operation: Loop creating and starting containers for each GPU.
- How many times: Once per GPU, so the number of GPUs controls the repeats.
Starting containers grows with the number of GPUs.
| Input Size (n) | Approx. Operations |
|---|---|
| 1 GPU | 1 container start |
| 4 GPUs | 4 container starts |
| 10 GPUs | 10 container starts |
Pattern observation: The time grows directly with the number of GPUs because each GPU needs its own container started.
Time Complexity: O(n)
This means the time to start GPU containers grows linearly with the number of GPUs used.
[X] Wrong: "Starting multiple GPU containers happens all at once, so time stays the same no matter how many GPUs."
[OK] Correct: Each container start takes time and happens one after another in this code, so total time adds up with more GPUs.
Understanding how resource setup time grows helps you design scalable machine learning pipelines and container orchestration strategies.
What if we started all GPU containers in parallel instead of a loop? How would the time complexity change?
Practice
Solution
Step 1: Understand GPU role in containers
GPUs speed up computing tasks by handling parallel processing efficiently.Step 2: Identify GPU support purpose
Enabling GPU support allows containers to access the host's GPU hardware for faster computation.Final Answer:
To allow containers to use the host's GPU for faster computing -> Option DQuick Check:
GPU support = faster computing [OK]
- Confusing GPU support with disk or memory changes
- Thinking GPU enables network access
- Assuming GPU support reduces container size
Solution
Step 1: Recall Docker GPU flag syntax
The official Docker flag to enable GPU support is--gpus.Step 2: Verify other options
Options like--enable-gpu,--gpu-access, and--use-gpuare incorrect or do not exist.Final Answer:
--gpus -> Option AQuick Check:
Docker GPU flag = --gpus [OK]
- Using incorrect flag names like --enable-gpu
- Confusing GPU flag with network or volume flags
- Omitting the flag entirely
docker run --gpus all nvidia/cuda:11.0-base nvidia-smi if the host has a compatible NVIDIA GPU and drivers installed?Solution
Step 1: Understand the command purpose
The command runs a container with full GPU access and executesnvidia-smito show GPU info.Step 2: Check host requirements
If the host has compatible NVIDIA GPU and drivers,nvidia-smiruns successfully inside the container.Final Answer:
Displays the NVIDIA GPU status and driver information -> Option AQuick Check:
Host GPU + drivers + --gpus = nvidia-smi output [OK]
- Assuming nvidia-smi is missing inside official CUDA image
- Ignoring host driver requirements
- Expecting GPU info without --gpus flag
docker run --gpus all nvidia/cuda:11.0-base nvidia-smi but get the error: 'docker: Error response from daemon: could not select device driver'. What is the most likely cause?Solution
Step 1: Analyze the error message
The error indicates Docker cannot find a GPU device driver to assign to the container.Step 2: Identify missing component
This usually happens if the NVIDIA Container Toolkit (nvidia-docker2) is not installed or configured on the host.Final Answer:
The NVIDIA Container Toolkit is not installed on the host -> Option CQuick Check:
Missing NVIDIA toolkit = device driver error [OK]
- Blaming Docker image for GPU support
- Assuming syntax error causes this message
- Thinking internet is required for this error
Solution
Step 1: Understand GPU selection syntax
To limit to specific GPUs 0 and 1, Docker uses the--gpus 'device=0,1'syntax to specify GPU IDs.Step 2: Evaluate options
docker run --gpus 2 nvidia/cuda:11.0-base nvidia-smirequests any 2 GPUs but does not specify GPUs 0 and 1.docker run --gpus 'count=2' nvidia/cuda:11.0-base nvidia-smiuses invalid syntaxcount=2.docker run --gpus all nvidia/cuda:11.0-base nvidia-smiuses all GPUs.Final Answer:
docker run --gpus 'device=0,1' nvidia/cuda:11.0-base nvidia-smi -> Option BQuick Check:
Specify GPUs by device IDs with --gpus 'device=...' [OK]
- Using --gpus 2 without device IDs
- Using invalid syntax like count=2
- Assuming --gpus all limits GPUs
