Cost optimization at scale in MLOps - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When managing machine learning operations at scale, it's important to understand how the cost of running tasks grows as the workload increases.
We want to know how the time and resources needed change when we handle more data or models.
Analyze the time complexity of the following cost calculation process.
for model in deployed_models:
for data_batch in incoming_data:
cost += compute_cost(model, data_batch)
This code calculates the total cost by checking each deployed model against each batch of incoming data.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Nested loops over models and data batches.
- How many times: For each model, it processes every data batch.
As the number of models or data batches grows, the total cost calculations increase quickly.
| Input Size (models x data batches) | Approx. Operations |
|---|---|
| 10 x 10 | 100 |
| 100 x 100 | 10,000 |
| 1000 x 1000 | 1,000,000 |
Pattern observation: Doubling both inputs causes the operations to grow by four times, showing a fast increase.
Time Complexity: O(n * m)
This means the time needed grows proportionally to the number of models times the number of data batches.
[X] Wrong: "The cost grows only with the number of models or data batches, not both together."
[OK] Correct: Because the code checks every model with every data batch, both inputs multiply the work, not just one.
Understanding how nested operations affect cost helps you explain and improve real-world ML system efficiency.
"What if we processed only a fixed number of data batches per model regardless of total batches? How would the time complexity change?"
Practice
Solution
Step 1: Understand cost optimization purpose
Cost optimization means using resources efficiently to reduce expenses.Step 2: Match resources to workload needs
Adjusting resources based on workload avoids waste and saves money.Final Answer:
To save money by matching resource use to workload needs -> Option DQuick Check:
Cost optimization = save money by matching resources [OK]
- Thinking more servers always means better
- Ignoring cost monitoring after deployment
- Assuming expensive resources are always best
Solution
Step 1: Understand spot instance labeling in Kubernetes
Spot instances are often labeled with lifecycle=spot to identify cheaper nodes.Step 2: Check node affinity syntax
affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "kubernetes.io/lifecycle" operator: In values: - spot correctly uses nodeAffinity with matchExpressions to select nodes labeled as spot.Final Answer:
affinity with nodeSelectorTerms matching lifecycle=spot label -> Option AQuick Check:
Spot instance selection uses nodeAffinity with lifecycle=spot label [OK]
- Using nodeSelector with wrong label key
- Setting resource requests to 'spot' (invalid)
- Confusing tolerations with node affinity
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ml-model-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ml-model
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
What happens when CPU usage rises to 75%?
Solution
Step 1: Understand Horizontal Pod Autoscaler (HPA) behavior
HPA increases pods when CPU usage exceeds target utilization to balance load.Step 2: Analyze CPU usage vs target
CPU is at 75%, above the 50% target, so HPA will scale up pods up to maxReplicas (10).Final Answer:
The number of pods will increase up to a maximum of 10 -> Option AQuick Check:
CPU > target utilization triggers pod scaling up [OK]
- Thinking pods scale down when CPU rises
- Confusing pod restart with scaling
- Assuming no change if CPU exceeds target
Solution
Step 1: Understand alert system sensitivity
Alerts trigger when costs exceed set thresholds; too low thresholds cause false alarms.Step 2: Evaluate other options
Incorrect charges or missing billing data cause different issues, not false alarms.Final Answer:
The alert thresholds are set too low or too sensitive -> Option BQuick Check:
Low alert thresholds cause false alarms [OK]
- Blaming cloud provider without proof
- Ignoring alert configuration
- Assuming billing API is always connected
Solution
Step 1: Identify cost-saving options for GPU jobs
Spot instances are cheaper but can be interrupted; checkpointing saves progress.Step 2: Combine autoscaling with spot instances and checkpointing
Autoscaling adjusts resources; checkpointing prevents data loss on interruptions.Step 3: Evaluate other options
On-demand is costly; CPUs are slower; peak hours usually cost more.Final Answer:
Use spot GPU instances with checkpointing and autoscaling to handle interruptions -> Option CQuick Check:
Spot + checkpoint + autoscale = best cost optimization [OK]
- Ignoring interruptions on spot instances
- Using expensive on-demand only
- Running on CPUs without code changes
- Scheduling during costly peak hours
