Practice

(1/5)

1. What is the main goal of cost optimization at scale in MLOps?

easy

A. To increase the number of servers regardless of workload

B. To avoid monitoring costs after deployment

C. To use only the most expensive cloud resources

D. To save money by matching resource use to workload needs

Solution

Step 1: Understand cost optimization purpose
Cost optimization means using resources efficiently to reduce expenses.
Step 2: Match resources to workload needs
Adjusting resources based on workload avoids waste and saves money.
Final Answer:
To save money by matching resource use to workload needs -> Option D
Quick Check:
Cost optimization = save money by matching resources [OK]

Hint: Cost optimization means using just enough resources [OK]

Common Mistakes:

Thinking more servers always means better
Ignoring cost monitoring after deployment
Assuming expensive resources are always best

2. Which of the following is a correct way to specify a spot instance in a Kubernetes pod spec for cost savings?

easy

A. affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "kubernetes.io/lifecycle" operator: In values: - spot

B. tolerations: - key: "spot-instance" operator: Exists effect: NoSchedule

C. nodeSelector: kubernetes.io/instance-type: spot

D. resources: requests: cpu: "spot" memory: "spot"

Solution

Step 1: Understand spot instance labeling in Kubernetes
Spot instances are often labeled with lifecycle=spot to identify cheaper nodes.
Step 2: Check node affinity syntax
affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "kubernetes.io/lifecycle" operator: In values: - spot correctly uses nodeAffinity with matchExpressions to select nodes labeled as spot.
Final Answer:
affinity with nodeSelectorTerms matching lifecycle=spot label -> Option A
Quick Check:
Spot instance selection uses nodeAffinity with lifecycle=spot label [OK]

Hint: Use nodeAffinity with lifecycle=spot label for spot nodes [OK]

Common Mistakes:

Using nodeSelector with wrong label key
Setting resource requests to 'spot' (invalid)
Confusing tolerations with node affinity

3. Given this autoscaling configuration snippet for a Kubernetes deployment:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ml-model-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ml-model
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

What happens when CPU usage rises to 75%?

medium

A. The number of pods will increase up to a maximum of 10

B. The number of pods will decrease to 2

C. The deployment will restart pods

D. Nothing changes because CPU target is 50%

Solution

Step 1: Understand Horizontal Pod Autoscaler (HPA) behavior
HPA increases pods when CPU usage exceeds target utilization to balance load.
Step 2: Analyze CPU usage vs target
CPU is at 75%, above the 50% target, so HPA will scale up pods up to maxReplicas (10).
Final Answer:
The number of pods will increase up to a maximum of 10 -> Option A
Quick Check:
CPU > target utilization triggers pod scaling up [OK]

Hint: CPU above target utilization triggers scaling up [OK]

Common Mistakes:

Thinking pods scale down when CPU rises
Confusing pod restart with scaling
Assuming no change if CPU exceeds target

4. You have a cloud cost alert system but it keeps sending false alarms about overspending. What is the most likely cause?

medium

A. The cloud provider is charging incorrectly

B. The alert thresholds are set too low or too sensitive

C. The system is not connected to the billing API

D. The cost data is updated only once a year

Solution

Step 1: Understand alert system sensitivity
Alerts trigger when costs exceed set thresholds; too low thresholds cause false alarms.
Step 2: Evaluate other options
Incorrect charges or missing billing data cause different issues, not false alarms.
Final Answer:
The alert thresholds are set too low or too sensitive -> Option B
Quick Check:
Low alert thresholds cause false alarms [OK]

Hint: Check alert thresholds if false alarms occur [OK]

Common Mistakes:

Blaming cloud provider without proof
Ignoring alert configuration
Assuming billing API is always connected

5. You want to reduce costs for a large ML training job that runs daily on cloud GPUs. Which combined strategy best optimizes cost at scale?

hard

A. Run training on CPUs to avoid GPU costs without changing code

B. Use only on-demand GPU instances and disable autoscaling

C. Use spot GPU instances with checkpointing and autoscaling to handle interruptions

D. Schedule training during peak hours to use full capacity

Solution

Step 1: Identify cost-saving options for GPU jobs
Spot instances are cheaper but can be interrupted; checkpointing saves progress.
Step 2: Combine autoscaling with spot instances and checkpointing
Autoscaling adjusts resources; checkpointing prevents data loss on interruptions.
Step 3: Evaluate other options
On-demand is costly; CPUs are slower; peak hours usually cost more.
Final Answer:
Use spot GPU instances with checkpointing and autoscaling to handle interruptions -> Option C
Quick Check:
Spot + checkpoint + autoscale = best cost optimization [OK]

Hint: Combine spot instances with checkpointing and autoscaling [OK]

Common Mistakes:

Ignoring interruptions on spot instances
Using expensive on-demand only
Running on CPUs without code changes
Scheduling during costly peak hours

Input Size (models x data batches)	Approx. Operations
10 x 10	100
100 x 100	10,000
1000 x 1000	1,000,000

Cost optimization at scale in MLOps - Time & Space Complexity

Start learning this pattern below

Practice

Solution

Step 1: Understand cost optimization purpose

Step 2: Match resources to workload needs

Final Answer:

Quick Check:

Solution

Step 1: Understand spot instance labeling in Kubernetes

Step 2: Check node affinity syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand Horizontal Pod Autoscaler (HPA) behavior

Step 2: Analyze CPU usage vs target

Final Answer:

Quick Check:

Solution

Step 1: Understand alert system sensitivity

Step 2: Evaluate other options

Final Answer:

Quick Check:

Solution

Step 1: Identify cost-saving options for GPU jobs

Step 2: Combine autoscaling with spot instances and checkpointing

Step 3: Evaluate other options

Final Answer:

Quick Check: