Bird
Raised Fist0
MLOpsdevops~10 mins

GPU support in containers in MLOps - Step-by-Step Execution

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Process Flow - GPU support in containers
Start Container
Check GPU Availability
Yes
Use NVIDIA Container Toolkit
Mount GPU Drivers & Libraries
Run Container with GPU Access
Container Uses GPU for Tasks
Stop Container
This flow shows how a container is started with GPU support by checking GPU availability, using NVIDIA toolkit to mount drivers, and enabling GPU access inside the container.
Execution Sample
MLOps
docker run --gpus all nvidia/cuda:12.0-base nvidia-smi
Runs a container with full GPU access and executes 'nvidia-smi' to show GPU status inside the container.
Process Table
StepActionCommand/CheckResult/Output
1Start container with GPU flagdocker run --gpus all nvidia/cuda:12.0-base nvidia-smiContainer starts with GPU access enabled
2Container checks for GPU devicesnvidia-smiLists GPU devices and their status
3Container runs GPU-enabled taskAny CUDA program inside containerTask uses GPU for computation
4Stop containerdocker stop <container_id>Container stops, GPU resources freed
💡 Container stops after GPU tasks complete or user stops it manually
Status Tracker
VariableStartAfter Step 1After Step 2After Step 3Final
Container StateNot runningRunning with GPU accessRunning with GPU detectedRunning with GPU task executingStopped
GPU AccessUnavailableEnabled via --gpus flagConfirmed by nvidia-smi outputUsed by CUDA tasksReleased
Key Moments - 3 Insights
Why do we need the '--gpus all' flag when running the container?
The '--gpus all' flag tells Docker to provide the container access to all GPUs on the host. Without it, the container cannot see or use the GPU devices, as shown in step 1 of the execution table.
What role does 'nvidia-smi' play inside the container?
'nvidia-smi' is a tool that lists GPU devices and their status. Running it inside the container (step 2) confirms that the container has access to the GPUs.
How does the container use GPU drivers and libraries from the host?
The NVIDIA Container Toolkit mounts the necessary GPU drivers and libraries from the host into the container automatically when using the '--gpus' flag, enabling GPU tasks inside the container as seen in step 3.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what command confirms GPU availability inside the container?
Anvidia-smi
Bdocker stop
Cdocker run --gpus all
DAny CUDA program
💡 Hint
Check step 2 in the execution table where GPU devices are listed.
At which step does the container start using the GPU for computation?
AStep 1
BStep 3
CStep 2
DStep 4
💡 Hint
Refer to step 3 where GPU-enabled tasks run inside the container.
If the '--gpus all' flag is omitted, what will happen according to the variable tracker?
AContainer State becomes 'Running with GPU access'
BGPU Access becomes 'Used by CUDA tasks'
CGPU Access remains 'Unavailable'
DContainer stops immediately
💡 Hint
Look at 'GPU Access' variable changes in the variable tracker after step 1.
Concept Snapshot
GPU support in containers:
- Use 'docker run --gpus all' to enable GPU access
- NVIDIA Container Toolkit mounts drivers inside container
- Run 'nvidia-smi' inside container to verify GPU
- GPU tasks run inside container using host GPU
- Stop container to release GPU resources
Full Transcript
To use GPU inside containers, start by running the container with the '--gpus all' flag. This flag enables GPU access by mounting necessary drivers and libraries using the NVIDIA Container Toolkit. Inside the container, running 'nvidia-smi' confirms GPU availability by listing GPU devices. Then, GPU-enabled programs can run inside the container using the host's GPU. Finally, stopping the container frees GPU resources. This process ensures containers can leverage GPUs for machine learning or other compute tasks.

Practice

(1/5)
1. What is the main purpose of enabling GPU support in containers?
easy
A. To reduce the container's memory usage
B. To increase the container's disk space
C. To enable network access inside the container
D. To allow containers to use the host's GPU for faster computing

Solution

  1. Step 1: Understand GPU role in containers

    GPUs speed up computing tasks by handling parallel processing efficiently.
  2. Step 2: Identify GPU support purpose

    Enabling GPU support allows containers to access the host's GPU hardware for faster computation.
  3. Final Answer:

    To allow containers to use the host's GPU for faster computing -> Option D
  4. Quick Check:

    GPU support = faster computing [OK]
Hint: GPU support means using host GPU inside container [OK]
Common Mistakes:
  • Confusing GPU support with disk or memory changes
  • Thinking GPU enables network access
  • Assuming GPU support reduces container size
2. Which Docker command flag is used to enable GPU support when running a container?
easy
A. --gpus
B. --enable-gpu
C. --gpu-access
D. --use-gpu

Solution

  1. Step 1: Recall Docker GPU flag syntax

    The official Docker flag to enable GPU support is --gpus.
  2. Step 2: Verify other options

    Options like --enable-gpu, --gpu-access, and --use-gpu are incorrect or do not exist.
  3. Final Answer:

    --gpus -> Option A
  4. Quick Check:

    Docker GPU flag = --gpus [OK]
Hint: Docker GPU flag is exactly --gpus [OK]
Common Mistakes:
  • Using incorrect flag names like --enable-gpu
  • Confusing GPU flag with network or volume flags
  • Omitting the flag entirely
3. What will be the output of the command docker run --gpus all nvidia/cuda:11.0-base nvidia-smi if the host has a compatible NVIDIA GPU and drivers installed?
medium
A. Displays the NVIDIA GPU status and driver information
B. Shows an error: 'nvidia-smi command not found'
C. Runs the container but shows no GPU information
D. Fails with 'GPU not accessible' error

Solution

  1. Step 1: Understand the command purpose

    The command runs a container with full GPU access and executes nvidia-smi to show GPU info.
  2. Step 2: Check host requirements

    If the host has compatible NVIDIA GPU and drivers, nvidia-smi runs successfully inside the container.
  3. Final Answer:

    Displays the NVIDIA GPU status and driver information -> Option A
  4. Quick Check:

    Host GPU + drivers + --gpus = nvidia-smi output [OK]
Hint: If host GPU ready, nvidia-smi shows GPU info inside container [OK]
Common Mistakes:
  • Assuming nvidia-smi is missing inside official CUDA image
  • Ignoring host driver requirements
  • Expecting GPU info without --gpus flag
4. You run docker run --gpus all nvidia/cuda:11.0-base nvidia-smi but get the error: 'docker: Error response from daemon: could not select device driver'. What is the most likely cause?
medium
A. The container command syntax is incorrect
B. The Docker image does not support GPUs
C. The NVIDIA Container Toolkit is not installed on the host
D. The host has no internet connection

Solution

  1. Step 1: Analyze the error message

    The error indicates Docker cannot find a GPU device driver to assign to the container.
  2. Step 2: Identify missing component

    This usually happens if the NVIDIA Container Toolkit (nvidia-docker2) is not installed or configured on the host.
  3. Final Answer:

    The NVIDIA Container Toolkit is not installed on the host -> Option C
  4. Quick Check:

    Missing NVIDIA toolkit = device driver error [OK]
Hint: Device driver error means NVIDIA Container Toolkit missing [OK]
Common Mistakes:
  • Blaming Docker image for GPU support
  • Assuming syntax error causes this message
  • Thinking internet is required for this error
5. You want to run a container with access to only GPUs 0 and 1 on a host with 4 GPUs. Which Docker run command correctly limits GPU access?
hard
A. docker run --gpus 2 nvidia/cuda:11.0-base nvidia-smi
B. docker run --gpus 'device=0,1' nvidia/cuda:11.0-base nvidia-smi
C. docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
D. docker run --gpus 'count=2' nvidia/cuda:11.0-base nvidia-smi

Solution

  1. Step 1: Understand GPU selection syntax

    To limit to specific GPUs 0 and 1, Docker uses the --gpus 'device=0,1' syntax to specify GPU IDs.
  2. Step 2: Evaluate options

    docker run --gpus 2 nvidia/cuda:11.0-base nvidia-smi requests any 2 GPUs but does not specify GPUs 0 and 1. docker run --gpus 'count=2' nvidia/cuda:11.0-base nvidia-smi uses invalid syntax count=2. docker run --gpus all nvidia/cuda:11.0-base nvidia-smi uses all GPUs.
  3. Final Answer:

    docker run --gpus 'device=0,1' nvidia/cuda:11.0-base nvidia-smi -> Option B
  4. Quick Check:

    Specify GPUs by device IDs with --gpus 'device=...' [OK]
Hint: Use --gpus 'device=0,1' to pick specific GPUs [OK]
Common Mistakes:
  • Using --gpus 2 without device IDs
  • Using invalid syntax like count=2
  • Assuming --gpus all limits GPUs