Bird
Raised Fist0
MLOpsdevops~15 mins

Kubernetes for ML workloads in MLOps - Deep Dive

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Overview - Kubernetes for ML workloads
What is it?
Kubernetes is a system that helps run and manage many computer programs on groups of computers. For machine learning (ML), it helps organize and run ML tasks like training models or serving predictions smoothly and reliably. It handles starting, stopping, and scaling these tasks automatically. This makes ML work easier to manage and more efficient.
Why it matters
Without Kubernetes, running ML tasks on many computers would be slow, error-prone, and hard to control. People would waste time fixing crashes or juggling resources manually. Kubernetes solves this by automating these tasks, so ML teams can focus on building better models and delivering results faster. It makes ML projects more reliable and scalable in real life.
Where it fits
Before learning Kubernetes for ML, you should understand basic ML workflows and container technology like Docker. After this, you can explore advanced ML deployment techniques, monitoring ML models in production, and using Kubernetes with specialized ML tools like Kubeflow or MLflow.
Mental Model
Core Idea
Kubernetes acts like a smart conductor that organizes and runs many ML tasks across computers, making sure they work well together and can grow or shrink as needed.
Think of it like...
Imagine a busy restaurant kitchen where many chefs prepare different dishes. Kubernetes is like the head chef who assigns tasks, ensures ingredients are available, and keeps the kitchen running smoothly even when orders change quickly.
┌───────────────────────────────┐
│          Kubernetes            │
│ ┌───────────────┐ ┌─────────┐ │
│ │ ML Training   │ │ ML Serving│ │
│ │ Pods         │ │ Pods     │ │
│ └───────────────┘ └─────────┘ │
│       │           │           │
│   ┌─────────┐ ┌─────────┐     │
│   │ Node 1  │ │ Node 2  │     │
│   └─────────┘ └─────────┘     │
└───────────────────────────────┘
Build-Up - 7 Steps
1
FoundationWhat is Kubernetes and Pods
🤔
Concept: Introduce Kubernetes as a system to run containers and explain what pods are.
Kubernetes is a tool that runs containers, which are like small packages holding your ML code and its environment. A pod is the smallest unit in Kubernetes and can hold one or more containers that work together. For ML, pods run tasks like training or prediction services.
Result
You understand Kubernetes runs containers inside pods, which are scheduled on computers called nodes.
Understanding pods as the basic unit helps you see how Kubernetes organizes ML tasks into manageable pieces.
2
FoundationContainers for ML Workloads
🤔
Concept: Explain why containers are used for ML and how they package code and dependencies.
Containers bundle ML code, libraries, and settings so the ML task runs the same everywhere. This avoids problems like 'it works on my computer but not on yours.' Docker is a popular tool to create containers. Kubernetes runs these containers reliably.
Result
You know how containers make ML tasks portable and consistent across environments.
Recognizing containers as self-contained units clarifies why Kubernetes depends on them to manage ML workloads.
3
IntermediateScheduling ML Tasks on Nodes
🤔Before reading on: do you think Kubernetes runs all ML tasks on one computer or spreads them across many? Commit to your answer.
Concept: Kubernetes schedules pods on different nodes (computers) to balance load and resources.
Kubernetes looks at available computers (nodes) and decides where to run each pod based on resource needs like CPU and memory. For ML, this means heavy training jobs can run on powerful nodes, while lighter tasks run elsewhere. This scheduling is automatic and dynamic.
Result
ML tasks run efficiently across multiple nodes without manual intervention.
Knowing Kubernetes schedules tasks smartly helps you trust it to optimize resource use for ML workloads.
4
IntermediateScaling ML Workloads Automatically
🤔Before reading on: do you think ML workloads need manual scaling or can Kubernetes adjust automatically? Commit to your answer.
Concept: Kubernetes can increase or decrease the number of pods running ML tasks based on demand.
If many users request predictions, Kubernetes can start more pods to handle the load. When demand drops, it reduces pods to save resources. This is called autoscaling and helps ML services stay responsive and cost-effective.
Result
ML services adjust their size automatically to match workload changes.
Understanding autoscaling reveals how Kubernetes keeps ML systems efficient and responsive without constant human control.
5
IntermediateManaging ML Data with Persistent Storage
🤔
Concept: Explain how Kubernetes handles data storage for ML tasks that need to save or access data persistently.
ML tasks often need to read or write data like training datasets or model files. Kubernetes uses Persistent Volumes (PV) and Persistent Volume Claims (PVC) to provide stable storage that pods can use even if they restart or move to another node.
Result
ML workloads can safely store and access data across pod restarts and rescheduling.
Knowing about persistent storage prevents data loss and supports reliable ML workflows on Kubernetes.
6
AdvancedUsing Custom Resources for ML Pipelines
🤔Before reading on: do you think Kubernetes can understand ML-specific tasks natively or needs extensions? Commit to your answer.
Concept: Kubernetes can be extended with custom resources to manage complex ML workflows like pipelines.
Tools like Kubeflow add custom resource definitions (CRDs) to Kubernetes, letting it understand ML concepts like training jobs, hyperparameter tuning, and pipelines. This makes managing ML workflows easier and more integrated.
Result
You can run and control complex ML pipelines inside Kubernetes using specialized tools.
Recognizing Kubernetes extensibility shows how it adapts to ML needs beyond basic container management.
7
ExpertOptimizing Resource Allocation for ML Workloads
🤔Before reading on: do you think Kubernetes always perfectly allocates resources for ML tasks or can it be tuned? Commit to your answer.
Concept: Advanced tuning of resource requests, limits, and node selection improves ML workload performance and cost.
ML workloads vary in resource needs. Setting accurate CPU, memory, and GPU requests and limits helps Kubernetes schedule pods efficiently. Using node selectors or taints ensures ML tasks run on suitable hardware. Misconfiguration can cause slow training or wasted resources.
Result
ML workloads run faster and cheaper with tuned resource settings and node targeting.
Understanding resource tuning prevents common performance bottlenecks and cost overruns in ML on Kubernetes.
Under the Hood
Kubernetes runs ML workloads by creating pods that contain containers with ML code. It uses a control plane to monitor cluster state and schedules pods on nodes based on resource availability and constraints. The kubelet on each node manages pod lifecycle. Persistent storage is abstracted via volumes. Extensions like CRDs allow Kubernetes to manage ML-specific resources and workflows.
Why designed this way?
Kubernetes was designed to manage containerized applications at scale with high availability and flexibility. Its modular architecture separates control and data planes, enabling extensibility. For ML, this design allows integration of specialized tools without changing core Kubernetes, supporting diverse ML workloads and rapid innovation.
┌───────────────┐       ┌───────────────┐
│ Control Plane │──────▶│ Scheduler     │
│ (API Server)  │       │ (Decides pods)│
└──────┬────────┘       └──────┬────────┘
       │                       │
       ▼                       ▼
┌───────────────┐       ┌───────────────┐
│ Node 1        │       │ Node 2        │
│ ┌───────────┐ │       │ ┌───────────┐ │
│ │ Pod (ML)  │ │       │ │ Pod (ML)  │ │
│ └───────────┘ │       │ └───────────┘ │
└───────────────┘       └───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do you think Kubernetes automatically understands ML tasks without extra tools? Commit to yes or no.
Common Belief:Kubernetes can natively manage all ML workflows without any extensions.
Tap to reveal reality
Reality:Kubernetes manages containers but needs extensions like Kubeflow to handle ML-specific workflows and pipelines.
Why it matters:Assuming native ML support leads to confusion and wasted effort trying to build complex ML pipelines without proper tools.
Quick: Do you think Kubernetes always uses all available resources on nodes fully? Commit to yes or no.
Common Belief:Kubernetes always perfectly uses all CPU, memory, and GPU resources on nodes.
Tap to reveal reality
Reality:Kubernetes relies on resource requests and limits set by users; poor settings can cause underutilization or overload.
Why it matters:Misconfigured resources cause slow ML training or wasted cloud costs, hurting project efficiency.
Quick: Do you think ML data inside pods is safe after pod restarts? Commit to yes or no.
Common Belief:Data stored inside a pod's container is safe even if the pod restarts or moves.
Tap to reveal reality
Reality:Data inside containers is lost on pod restart; persistent volumes are needed for safe data storage.
Why it matters:Losing training data or models after pod restarts can cause lost work and unreliable ML results.
Quick: Do you think Kubernetes autoscaling works perfectly for all ML workloads without tuning? Commit to yes or no.
Common Belief:Kubernetes autoscaling always adjusts ML workloads perfectly without any configuration.
Tap to reveal reality
Reality:Autoscaling needs proper metrics and configuration; otherwise, it may scale too late or too much.
Why it matters:Poor autoscaling causes slow responses or wasted resources, impacting ML service quality and cost.
Expert Zone
1
Kubernetes scheduling decisions can be influenced by subtle factors like pod affinity, taints, and tolerations, which experts use to optimize ML workload placement.
2
GPU resource management in Kubernetes requires special device plugins and careful quota settings to avoid conflicts and maximize ML training speed.
3
Network policies in Kubernetes can restrict ML service communication, so experts design policies balancing security and necessary data flow.
When NOT to use
Kubernetes may be too complex for small or simple ML projects where lightweight solutions like local Docker or managed ML platforms (e.g., SageMaker, Vertex AI) are better. For real-time low-latency ML inference, specialized serving systems might outperform Kubernetes.
Production Patterns
In production, ML teams use Kubernetes with Kubeflow for pipelines, set up autoscaling based on custom ML metrics, use GPU nodes with device plugins, and integrate monitoring tools like Prometheus to track ML workload health and performance.
Connections
Distributed Systems
Kubernetes builds on distributed system principles to manage workloads across many machines.
Understanding distributed systems helps grasp how Kubernetes handles failures, scaling, and coordination for ML workloads.
Cloud Computing
Kubernetes runs on cloud infrastructure to provide scalable ML services.
Knowing cloud basics clarifies how Kubernetes leverages virtual machines and storage to run ML tasks flexibly.
Factory Assembly Lines
Kubernetes orchestrates ML tasks like an assembly line organizes production steps.
Seeing ML workflows as assembly lines helps understand how Kubernetes pipelines automate and coordinate complex ML processes.
Common Pitfalls
#1Running ML workloads without setting resource requests and limits.
Wrong approach:apiVersion: v1 kind: Pod metadata: name: ml-training spec: containers: - name: trainer image: ml-image command: ["python", "train.py"]
Correct approach:apiVersion: v1 kind: Pod metadata: name: ml-training spec: containers: - name: trainer image: ml-image command: ["python", "train.py"] resources: requests: cpu: "4" memory: "16Gi" limits: cpu: "8" memory: "32Gi"
Root cause:Beginners often skip resource settings, not realizing Kubernetes needs them to schedule pods properly.
#2Storing ML data inside container filesystem expecting persistence.
Wrong approach:apiVersion: v1 kind: Pod metadata: name: ml-pod spec: containers: - name: ml-container image: ml-image volumeMounts: - mountPath: /data name: data-volume volumes: - name: data-volume emptyDir: {}
Correct approach:apiVersion: v1 kind: Pod metadata: name: ml-pod spec: containers: - name: ml-container image: ml-image volumeMounts: - mountPath: /data name: data-volume volumes: - name: data-volume persistentVolumeClaim: claimName: ml-pvc
Root cause:Misunderstanding that emptyDir is temporary storage cleared on pod restart.
#3Expecting Kubernetes to autoscale ML workloads without configuring metrics.
Wrong approach:apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: ml-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: ml-deployment minReplicas: 1 maxReplicas: 10
Correct approach:apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: ml-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: ml-deployment minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70
Root cause:Not specifying metrics means Kubernetes cannot decide when to scale pods.
Key Takeaways
Kubernetes manages ML workloads by running containerized tasks inside pods distributed across many computers.
Containers make ML code portable and consistent, while Kubernetes automates running, scaling, and recovering these tasks.
Proper resource settings and persistent storage are essential to run ML workloads efficiently and reliably on Kubernetes.
Extensions like Kubeflow add ML-specific features, enabling complex pipelines and workflows within Kubernetes.
Expert tuning of scheduling, autoscaling, and GPU usage unlocks the full power of Kubernetes for production ML systems.

Practice

(1/5)
1. What is the primary Kubernetes resource used to run a one-time ML training task?
easy
A. Job
B. Deployment
C. Service
D. ConfigMap

Solution

  1. Step 1: Understand Kubernetes resource types

    Jobs are designed to run tasks that complete once, like ML training.
  2. Step 2: Match resource to ML training task

    Since training is a one-time batch task, Job is the correct resource.
  3. Final Answer:

    Job -> Option A
  4. Quick Check:

    One-time ML training = Job [OK]
Hint: Use Job for one-time tasks like training [OK]
Common Mistakes:
  • Choosing Deployment which is for long-running services
  • Confusing Service with workload resource
  • Using ConfigMap which stores config data only
2. Which of the following is the correct YAML snippet to request 2 GPUs in a Kubernetes pod spec?
easy
A. resources: requests: cpu: 2
B. resources: limits: memory: 2Gi
C. resources: limits: nvidia.com/gpu: 2
D. resources: requests: gpu: 2

Solution

  1. Step 1: Identify GPU resource naming in Kubernetes

    GPUs are requested using the vendor-specific resource name like nvidia.com/gpu.
  2. Step 2: Check correct YAML structure for limits

    GPUs are usually set under limits, not requests, with the correct key.
  3. Final Answer:

    resources: limits: nvidia.com/gpu: 2 -> Option C
  4. Quick Check:

    GPU request uses nvidia.com/gpu under limits [OK]
Hint: GPU requests use 'limits' with 'nvidia.com/gpu' key [OK]
Common Mistakes:
  • Using 'gpu' instead of 'nvidia.com/gpu'
  • Placing GPU under requests instead of limits
  • Confusing CPU or memory keys with GPU
3. Given this Kubernetes Job YAML snippet, what will happen when applied?
apiVersion: batch/v1
kind: Job
metadata:
  name: ml-train
spec:
  template:
    spec:
      containers:
      - name: trainer
        image: ml-image:latest
        command: ["python", "train.py"]
      restartPolicy: Never
  backoffLimit: 3
medium
A. The Job runs the training once and retries up to 3 times on failure
B. The Job runs continuously without stopping
C. The Job will fail immediately due to missing restartPolicy
D. The Job creates a Deployment instead of a batch task

Solution

  1. Step 1: Understand Job behavior with backoffLimit

    The backoffLimit sets how many retries happen on failure before Job stops.
  2. Step 2: Check restartPolicy and command

    restartPolicy: Never means pods won't restart automatically; Job controller retries pods.
  3. Final Answer:

    The Job runs the training once and retries up to 3 times on failure -> Option A
  4. Quick Check:

    Job with backoffLimit retries 3 times [OK]
Hint: backoffLimit controls retry count for Job failures [OK]
Common Mistakes:
  • Thinking Job runs continuously like Deployment
  • Assuming restartPolicy: Never causes immediate failure
  • Confusing Job with Deployment resource
4. You deployed an ML model with a Deployment but the pods keep restarting. Which is the most likely cause?
medium
A. The ConfigMap is not mounted
B. The Deployment spec is missing replicas field
C. The Service is not exposing the Deployment
D. The container image is missing or incorrect

Solution

  1. Step 1: Analyze pod restart reasons

    Pods restarting often means container crashes, commonly due to bad image or command.
  2. Step 2: Check other options relevance

    Missing replicas defaults to 1, Service exposure doesn't cause restarts, ConfigMap missing causes config errors but not always restarts.
  3. Final Answer:

    The container image is missing or incorrect -> Option D
  4. Quick Check:

    Pod restarts usually mean bad container image [OK]
Hint: Pod restarts often mean container image or command error [OK]
Common Mistakes:
  • Assuming missing replicas causes restarts
  • Confusing Service exposure with pod health
  • Thinking ConfigMap absence always crashes pods
5. You want to deploy an ML model serving system that automatically scales based on CPU usage. Which Kubernetes resource and feature combination is best?
hard
A. DaemonSet to run one pod per node
B. Deployment with Horizontal Pod Autoscaler (HPA)
C. StatefulSet with persistent volumes
D. Job with backoffLimit set to 5

Solution

  1. Step 1: Identify resource for long-running model serving

    Deployment manages long-running pods and supports updates.
  2. Step 2: Choose scaling feature for CPU-based autoscaling

    Horizontal Pod Autoscaler (HPA) automatically adjusts pod count based on CPU usage.
  3. Final Answer:

    Deployment with Horizontal Pod Autoscaler (HPA) -> Option B
  4. Quick Check:

    Use Deployment + HPA for scalable model serving [OK]
Hint: Use Deployment + HPA for auto-scaling model serving [OK]
Common Mistakes:
  • Using Job which is for batch tasks, not serving
  • Choosing StatefulSet which is for stateful apps
  • DaemonSet runs pods on all nodes, not for scaling