0
0
MLOpsdevops~20 mins

Kubernetes for ML workloads in MLOps - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Kubernetes ML Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding Kubernetes Pod Scheduling for ML Jobs

Which Kubernetes feature ensures that ML training pods are scheduled on nodes with GPUs?

ANode Affinity with GPU label selectors
BHorizontal Pod Autoscaler
CConfigMap volume mounts
DPod Security Policies
Attempts:
2 left
💡 Hint

Think about how Kubernetes selects nodes based on hardware capabilities.

💻 Command Output
intermediate
2:00remaining
Output of kubectl describe on ML Training Pod

What is the output of kubectl describe pod ml-train-pod if the pod is pending due to insufficient GPU resources?

MLOps
kubectl describe pod ml-train-pod
A
Events:
  Type     Normal  Reason    Age   From               Message
  ----     ------  ------    ----  ----               -------
  Normal   Scheduled  1m    default-scheduler  Successfully assigned ml-train-pod to node-1
B
Status: Running
Containers:
  ml-container:
    State: Running
    Ready: True
CError from server (NotFound): pods "ml-train-pod" not found
D
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  2m    default-scheduler  0/3 nodes are available: 3 Insufficient nvidia.com/gpu.
Attempts:
2 left
💡 Hint

Look for messages about resource availability in the pod events.

Configuration
advanced
2:30remaining
Configuring Persistent Storage for ML Data in Kubernetes

Which YAML snippet correctly defines a PersistentVolumeClaim (PVC) for 50Gi of storage with ReadWriteOnce access mode suitable for ML training data?

A
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ml-data-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    limits:
      storage: 50Gi
B
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ml-data-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    limits:
      storage: 50Gi
C
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ml-data-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
D
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ml-data-pvc
spec:
  accessModes:
    - ReadOnlyMany
  resources:
    requests:
      storage: 50Gi
Attempts:
2 left
💡 Hint

Check the correct field for storage size and access mode for single node write access.

🔀 Workflow
advanced
3:00remaining
Steps to Deploy a Distributed ML Training Job on Kubernetes

What is the correct order of steps to deploy a distributed ML training job using Kubernetes?

A3,1,2,4
B1,3,2,4
C1,2,3,4
D2,1,3,4
Attempts:
2 left
💡 Hint

Think about building the image before pushing and defining manifests before applying.

Troubleshoot
expert
3:00remaining
Diagnosing ML Pod CrashLoopBackOff due to GPU Driver Issues

An ML training pod repeatedly crashes with CrashLoopBackOff. Logs show Failed to initialize GPU device. What is the most likely cause?

AThe node lacks the NVIDIA GPU device plugin or drivers are not installed properly.
BThe pod's container image is missing the ML training code.
CThe PersistentVolumeClaim is not bound to a PersistentVolume.
DThe pod's CPU resource requests exceed node capacity.
Attempts:
2 left
💡 Hint

Consider hardware and driver compatibility for GPU access.