Bird
Raised Fist0
MLOpsdevops~20 mins

Compute resource management in MLOps - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Compute Resource Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
💻 Command Output
intermediate
2:00remaining
Output of Kubernetes resource request command
What is the output of the following command when run on a pod named ml-training-pod in Kubernetes?

kubectl get pod ml-training-pod -o jsonpath='{.spec.containers[0].resources.requests}'
A{"cpu":"500m","memory":"1Gi"}
B{"cpu":"1","memory":"512Mi"}
CError: resource requests not found
D{"cpu":"100m","memory":"256Mi"}
Attempts:
2 left
💡 Hint
Resource requests define the minimum compute resources a container needs.
🧠 Conceptual
intermediate
2:00remaining
Understanding GPU resource allocation in MLOps
In an MLOps pipeline, why is it important to specify GPU resource limits for training jobs?
ATo allow the training job to run without any CPU resource limits
BTo prevent a training job from using more GPU memory than allocated, avoiding interference with other jobs
CTo increase the training speed by automatically adding more GPUs when needed
DTo disable GPU usage and force training on CPU only
Attempts:
2 left
💡 Hint
Think about resource sharing in a multi-tenant environment.
🔀 Workflow
advanced
3:00remaining
Order the steps to configure autoscaling for compute resources in a Kubernetes cluster
Arrange the following steps in the correct order to enable autoscaling of pods based on CPU usage in Kubernetes.
A1,3,2,4
B3,1,2,4
C1,2,3,4
D3,2,1,4
Attempts:
2 left
💡 Hint
Metrics collection must be ready before autoscaler can use metrics.
Troubleshoot
advanced
2:00remaining
Identify the cause of pod scheduling failure due to resource constraints
A pod in Kubernetes fails to schedule with the message: 0/5 nodes are available: 5 Insufficient cpu. What is the most likely cause?
AThe pod's node selector does not match any node labels.
BThe pod's container image is too large to download on nodes.
CThe pod lacks a resource limit for memory.
DThe pod requests more CPU than any node currently has available.
Attempts:
2 left
💡 Hint
Focus on the error message about CPU availability.
Best Practice
expert
2:30remaining
Best practice for managing compute resources in multi-tenant MLOps environments
Which practice best ensures fair and efficient compute resource usage among multiple teams running ML workloads on shared infrastructure?
ADisable resource requests and limits to maximize scheduling flexibility.
BAllow all teams to request unlimited resources and rely on manual monitoring.
CImplement resource quotas and limit ranges per namespace to control resource consumption.
DUse a single shared namespace without resource restrictions for simplicity.
Attempts:
2 left
💡 Hint
Think about automated controls to prevent resource hogging.

Practice

(1/5)
1. What is the main purpose of compute resource management in MLOps?
easy
A. To write machine learning model code
B. To store data permanently on disk
C. To create user interfaces for ML applications
D. To control CPU, memory, and GPU usage for efficient job execution

Solution

  1. Step 1: Understand resource management role

    Compute resource management controls hardware resources like CPU, memory, and GPU.
  2. Step 2: Identify its purpose in MLOps

    It ensures jobs run efficiently and avoid crashes by managing these resources.
  3. Final Answer:

    To control CPU, memory, and GPU usage for efficient job execution -> Option D
  4. Quick Check:

    Resource management = control CPU, memory, GPU [OK]
Hint: Think about what hardware resources need managing [OK]
Common Mistakes:
  • Confusing resource management with coding tasks
  • Thinking it manages data storage only
  • Assuming it builds user interfaces
2. Which command correctly allocates GPU resources for a job in Kubernetes?
easy
A. kubectl run job --gpu=2
B. kubectl run job --requests=nvidia.com/gpu=2
C. kubectl run job --memory=2Gi
D. kubectl run job --cpu=2

Solution

  1. Step 1: Recall Kubernetes resource request syntax

    Kubernetes uses resource requests like --requests=nvidia.com/gpu=2 to allocate GPUs.
  2. Step 2: Match correct GPU allocation command

    kubectl run job --requests=nvidia.com/gpu=2 uses the correct syntax for GPU requests in Kubernetes.
  3. Final Answer:

    kubectl run job --requests=nvidia.com/gpu=2 -> Option B
  4. Quick Check:

    GPU allocation uses --requests=nvidia.com/gpu [OK]
Hint: Look for --requests with nvidia.com/gpu key [OK]
Common Mistakes:
  • Using --gpu directly (not valid syntax)
  • Confusing memory or CPU flags with GPU
  • Missing the resource request keyword
3. Given this Kubernetes pod spec snippet, what is the CPU limit set for the container?
resources:
  limits:
    cpu: "4"
  requests:
    cpu: "2"
medium
A. 4 CPUs
B. 6 CPUs
C. No CPU limit set
D. 2 CPUs

Solution

  1. Step 1: Identify CPU limit in pod spec

    The 'limits' section sets the maximum CPU usage, here cpu: "4" means 4 CPUs.
  2. Step 2: Understand difference between requests and limits

    Requests are minimum guaranteed (2 CPUs), limits are max allowed (4 CPUs).
  3. Final Answer:

    4 CPUs -> Option A
  4. Quick Check:

    CPU limit = 4 CPUs [OK]
Hint: Limits set max CPU, requests set minimum [OK]
Common Mistakes:
  • Confusing requests with limits
  • Ignoring quotes around CPU values
  • Assuming no limit means unlimited
4. You see this error when submitting a job: Insufficient cpu resources. What is the most likely cause?
medium
A. The job is missing GPU allocation
B. The job has no CPU requests set
C. The job requests more CPU than available on the cluster
D. The job memory limit is too high

Solution

  1. Step 1: Interpret the error message

    'Insufficient cpu resources' means requested CPU exceeds cluster capacity.
  2. Step 2: Identify cause from options

    The job requests more CPU than available on the cluster matches the error cause: job requests more CPU than available.
  3. Final Answer:

    The job requests more CPU than available on the cluster -> Option C
  4. Quick Check:

    Insufficient CPU = request > available [OK]
Hint: Error means requested CPU > cluster CPU [OK]
Common Mistakes:
  • Assuming missing CPU requests cause this error
  • Confusing CPU and GPU errors
  • Blaming memory limits for CPU shortage
5. You want to run multiple ML training jobs on a GPU cluster. Which strategy best manages GPU resources to avoid conflicts?
hard
A. Allocate GPUs explicitly per job and release after completion
B. Run all jobs without GPU limits and share GPUs freely
C. Assign CPU limits only and ignore GPU allocation
D. Use only CPU resources to avoid GPU conflicts

Solution

  1. Step 1: Understand GPU resource management needs

    Explicit allocation prevents multiple jobs from using the same GPU simultaneously.
  2. Step 2: Evaluate options for best practice

    Allocate GPUs explicitly per job and release after completion correctly allocates and releases GPUs per job to avoid conflicts.
  3. Final Answer:

    Allocate GPUs explicitly per job and release after completion -> Option A
  4. Quick Check:

    Explicit GPU allocation avoids conflicts [OK]
Hint: Always allocate and release GPUs per job [OK]
Common Mistakes:
  • Ignoring GPU allocation causing conflicts
  • Assuming CPU limits control GPU usage
  • Avoiding GPUs when cluster has them