0
0
MLOpsdevops~10 mins

GPU support in containers in MLOps - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - GPU support in containers
Start Container
Check GPU Availability
Yes
Use NVIDIA Container Toolkit
Mount GPU Drivers & Libraries
Run Container with GPU Access
Container Uses GPU for Tasks
Stop Container
This flow shows how a container is started with GPU support by checking GPU availability, using NVIDIA toolkit to mount drivers, and enabling GPU access inside the container.
Execution Sample
MLOps
docker run --gpus all nvidia/cuda:12.0-base nvidia-smi
Runs a container with full GPU access and executes 'nvidia-smi' to show GPU status inside the container.
Process Table
StepActionCommand/CheckResult/Output
1Start container with GPU flagdocker run --gpus all nvidia/cuda:12.0-base nvidia-smiContainer starts with GPU access enabled
2Container checks for GPU devicesnvidia-smiLists GPU devices and their status
3Container runs GPU-enabled taskAny CUDA program inside containerTask uses GPU for computation
4Stop containerdocker stop <container_id>Container stops, GPU resources freed
💡 Container stops after GPU tasks complete or user stops it manually
Status Tracker
VariableStartAfter Step 1After Step 2After Step 3Final
Container StateNot runningRunning with GPU accessRunning with GPU detectedRunning with GPU task executingStopped
GPU AccessUnavailableEnabled via --gpus flagConfirmed by nvidia-smi outputUsed by CUDA tasksReleased
Key Moments - 3 Insights
Why do we need the '--gpus all' flag when running the container?
The '--gpus all' flag tells Docker to provide the container access to all GPUs on the host. Without it, the container cannot see or use the GPU devices, as shown in step 1 of the execution table.
What role does 'nvidia-smi' play inside the container?
'nvidia-smi' is a tool that lists GPU devices and their status. Running it inside the container (step 2) confirms that the container has access to the GPUs.
How does the container use GPU drivers and libraries from the host?
The NVIDIA Container Toolkit mounts the necessary GPU drivers and libraries from the host into the container automatically when using the '--gpus' flag, enabling GPU tasks inside the container as seen in step 3.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what command confirms GPU availability inside the container?
Anvidia-smi
Bdocker stop
Cdocker run --gpus all
DAny CUDA program
💡 Hint
Check step 2 in the execution table where GPU devices are listed.
At which step does the container start using the GPU for computation?
AStep 1
BStep 3
CStep 2
DStep 4
💡 Hint
Refer to step 3 where GPU-enabled tasks run inside the container.
If the '--gpus all' flag is omitted, what will happen according to the variable tracker?
AContainer State becomes 'Running with GPU access'
BGPU Access becomes 'Used by CUDA tasks'
CGPU Access remains 'Unavailable'
DContainer stops immediately
💡 Hint
Look at 'GPU Access' variable changes in the variable tracker after step 1.
Concept Snapshot
GPU support in containers:
- Use 'docker run --gpus all' to enable GPU access
- NVIDIA Container Toolkit mounts drivers inside container
- Run 'nvidia-smi' inside container to verify GPU
- GPU tasks run inside container using host GPU
- Stop container to release GPU resources
Full Transcript
To use GPU inside containers, start by running the container with the '--gpus all' flag. This flag enables GPU access by mounting necessary drivers and libraries using the NVIDIA Container Toolkit. Inside the container, running 'nvidia-smi' confirms GPU availability by listing GPU devices. Then, GPU-enabled programs can run inside the container using the host's GPU. Finally, stopping the container frees GPU resources. This process ensures containers can leverage GPUs for machine learning or other compute tasks.