Bird
Raised Fist0
MLOpsdevops~10 mins

Kubernetes for ML workloads in MLOps - Step-by-Step Execution

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Process Flow - Kubernetes for ML workloads
Prepare ML Model Container
Create Kubernetes Deployment
Kubernetes Scheduler Assigns Pod
Pod Runs ML Container
Model Serves Predictions
Monitor & Scale Pods Based on Load
Update Model or Config
This flow shows how an ML model container is deployed on Kubernetes, scheduled as pods, serves predictions, and scales based on demand.
Execution Sample
MLOps
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-model
  template:
    metadata:
      labels:
        app: ml-model
    spec:
      containers:
      - name: model
        image: mlmodel:latest
        ports:
        - containerPort: 5000
This YAML deploys 2 replicas of an ML model container on Kubernetes, exposing port 5000 for predictions.
Process Table
StepActionKubernetes ComponentResult
1Apply Deployment YAMLkubectlDeployment 'ml-model' created with 2 replicas
2Scheduler assigns pods to nodesKubernetes Scheduler2 pods scheduled on available nodes
3Pods start containersKubeletML model containers running and listening on port 5000
4Service routes trafficKubernetes ServiceRequests to model are load balanced across pods
5Monitor loadHorizontal Pod AutoscalerPods scaled up/down based on CPU usage
6Update Deployment with new imagekubectlRolling update triggers new pods with updated model
7Old pods terminatedKubernetes ControllerDeployment updated successfully
8End-Model serving stable with desired replicas
💡 Deployment reaches desired state with pods running and serving predictions
Status Tracker
VariableStartAfter Step 1After Step 3After Step 5Final
Deployment replicas0223 (scaled up)3
Pods running00233
Model versionnonev1 (mlmodel:latest)v1v1v2 (after update)
Key Moments - 3 Insights
Why do we see pods starting only after the scheduler assigns them?
Because Kubernetes first decides which nodes will run the pods (Step 2), only then the kubelet on those nodes starts the containers (Step 3). This ensures pods run on suitable nodes.
How does scaling happen automatically when load increases?
The Horizontal Pod Autoscaler monitors CPU usage (Step 5) and increases pod replicas when usage is high, as shown by the increase from 2 to 3 pods in the variable tracker.
What happens during a rolling update of the ML model?
When a new image is applied (Step 6), Kubernetes creates new pods with the updated model and gradually terminates old pods (Step 7) to avoid downtime.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, at which step do pods start running the ML containers?
AStep 2
BStep 5
CStep 3
DStep 1
💡 Hint
Check the 'Pods start containers' action in the execution table at Step 3
According to the variable tracker, how many pods are running after scaling?
A2
B3
C1
D0
💡 Hint
Look at the 'Pods running' row after Step 5 in the variable tracker
If the deployment YAML changes the image to a new version, what happens next according to the execution table?
ARolling update triggers new pods with updated model
BNothing changes until manual restart
CPods are immediately deleted
DScheduler assigns pods to nodes
💡 Hint
See Step 6 in the execution table about updating deployment with new image
Concept Snapshot
Kubernetes for ML workloads:
- Package ML model as container image
- Create Deployment YAML with replicas
- Apply YAML to create pods running model
- Use Service to route prediction requests
- Autoscale pods based on load
- Update Deployment for new model versions
- Rolling updates avoid downtime
Full Transcript
This visual execution shows how Kubernetes manages ML workloads by deploying containerized models as pods. First, the ML model is packaged into a container image. Then a Deployment YAML specifies how many replicas to run. Applying this YAML creates a Deployment resource. The Kubernetes scheduler assigns pods to nodes, and kubelets start the containers. A Service load balances prediction requests to pods. The Horizontal Pod Autoscaler monitors load and scales pods up or down automatically. When a new model version is available, updating the Deployment triggers a rolling update, replacing old pods with new ones without downtime. Variables like pod count and model version change step-by-step, helping beginners understand the process clearly.

Practice

(1/5)
1. What is the primary Kubernetes resource used to run a one-time ML training task?
easy
A. Job
B. Deployment
C. Service
D. ConfigMap

Solution

  1. Step 1: Understand Kubernetes resource types

    Jobs are designed to run tasks that complete once, like ML training.
  2. Step 2: Match resource to ML training task

    Since training is a one-time batch task, Job is the correct resource.
  3. Final Answer:

    Job -> Option A
  4. Quick Check:

    One-time ML training = Job [OK]
Hint: Use Job for one-time tasks like training [OK]
Common Mistakes:
  • Choosing Deployment which is for long-running services
  • Confusing Service with workload resource
  • Using ConfigMap which stores config data only
2. Which of the following is the correct YAML snippet to request 2 GPUs in a Kubernetes pod spec?
easy
A. resources: requests: cpu: 2
B. resources: limits: memory: 2Gi
C. resources: limits: nvidia.com/gpu: 2
D. resources: requests: gpu: 2

Solution

  1. Step 1: Identify GPU resource naming in Kubernetes

    GPUs are requested using the vendor-specific resource name like nvidia.com/gpu.
  2. Step 2: Check correct YAML structure for limits

    GPUs are usually set under limits, not requests, with the correct key.
  3. Final Answer:

    resources: limits: nvidia.com/gpu: 2 -> Option C
  4. Quick Check:

    GPU request uses nvidia.com/gpu under limits [OK]
Hint: GPU requests use 'limits' with 'nvidia.com/gpu' key [OK]
Common Mistakes:
  • Using 'gpu' instead of 'nvidia.com/gpu'
  • Placing GPU under requests instead of limits
  • Confusing CPU or memory keys with GPU
3. Given this Kubernetes Job YAML snippet, what will happen when applied?
apiVersion: batch/v1
kind: Job
metadata:
  name: ml-train
spec:
  template:
    spec:
      containers:
      - name: trainer
        image: ml-image:latest
        command: ["python", "train.py"]
      restartPolicy: Never
  backoffLimit: 3
medium
A. The Job runs the training once and retries up to 3 times on failure
B. The Job runs continuously without stopping
C. The Job will fail immediately due to missing restartPolicy
D. The Job creates a Deployment instead of a batch task

Solution

  1. Step 1: Understand Job behavior with backoffLimit

    The backoffLimit sets how many retries happen on failure before Job stops.
  2. Step 2: Check restartPolicy and command

    restartPolicy: Never means pods won't restart automatically; Job controller retries pods.
  3. Final Answer:

    The Job runs the training once and retries up to 3 times on failure -> Option A
  4. Quick Check:

    Job with backoffLimit retries 3 times [OK]
Hint: backoffLimit controls retry count for Job failures [OK]
Common Mistakes:
  • Thinking Job runs continuously like Deployment
  • Assuming restartPolicy: Never causes immediate failure
  • Confusing Job with Deployment resource
4. You deployed an ML model with a Deployment but the pods keep restarting. Which is the most likely cause?
medium
A. The ConfigMap is not mounted
B. The Deployment spec is missing replicas field
C. The Service is not exposing the Deployment
D. The container image is missing or incorrect

Solution

  1. Step 1: Analyze pod restart reasons

    Pods restarting often means container crashes, commonly due to bad image or command.
  2. Step 2: Check other options relevance

    Missing replicas defaults to 1, Service exposure doesn't cause restarts, ConfigMap missing causes config errors but not always restarts.
  3. Final Answer:

    The container image is missing or incorrect -> Option D
  4. Quick Check:

    Pod restarts usually mean bad container image [OK]
Hint: Pod restarts often mean container image or command error [OK]
Common Mistakes:
  • Assuming missing replicas causes restarts
  • Confusing Service exposure with pod health
  • Thinking ConfigMap absence always crashes pods
5. You want to deploy an ML model serving system that automatically scales based on CPU usage. Which Kubernetes resource and feature combination is best?
hard
A. DaemonSet to run one pod per node
B. Deployment with Horizontal Pod Autoscaler (HPA)
C. StatefulSet with persistent volumes
D. Job with backoffLimit set to 5

Solution

  1. Step 1: Identify resource for long-running model serving

    Deployment manages long-running pods and supports updates.
  2. Step 2: Choose scaling feature for CPU-based autoscaling

    Horizontal Pod Autoscaler (HPA) automatically adjusts pod count based on CPU usage.
  3. Final Answer:

    Deployment with Horizontal Pod Autoscaler (HPA) -> Option B
  4. Quick Check:

    Use Deployment + HPA for scalable model serving [OK]
Hint: Use Deployment + HPA for auto-scaling model serving [OK]
Common Mistakes:
  • Using Job which is for batch tasks, not serving
  • Choosing StatefulSet which is for stateful apps
  • DaemonSet runs pods on all nodes, not for scaling