Practice

(1/5)

1. What is the main purpose of auto-scaling inference endpoints in ML services?

easy

A. To automatically adjust the number of servers based on traffic

B. To manually add servers when traffic increases

C. To reduce the accuracy of ML models during high traffic

D. To store more data for training models

Solution

Step 1: Understand auto-scaling concept
Auto-scaling means the system changes the number of servers automatically depending on the traffic load.
Step 2: Identify the purpose in ML inference
For ML inference endpoints, auto-scaling keeps the service fast and cost-efficient by adjusting servers without manual work.
Final Answer:
To automatically adjust the number of servers based on traffic -> Option A
Quick Check:
Auto-scaling = automatic server adjustment [OK]

Hint: Auto-scaling means automatic server count change [OK]

Common Mistakes:

Thinking auto-scaling requires manual server changes
Confusing auto-scaling with model accuracy changes
Believing auto-scaling stores training data

2. Which configuration setting defines the minimum number of servers to keep running in an auto-scaling inference endpoint?

easy

A. max_servers

B. scale_up_threshold

C. target_utilization

D. min_servers

Solution

Step 1: Identify minimum server setting
The minimum number of servers to keep running is controlled by the setting named min_servers.
Step 2: Differentiate from other settings
max_servers sets the upper limit, target_utilization controls load target, and scale_up_threshold is not a standard setting here.
Final Answer:
min_servers -> Option D
Quick Check:
Minimum servers = min_servers [OK]

Hint: Min servers setting always starts with 'min_' [OK]

Common Mistakes:

Confusing max_servers with minimum servers
Mixing target utilization with server count
Using non-existent settings like scale_up_threshold

3. Given this auto-scaling config snippet:

{
  "min_servers": 2,
  "max_servers": 5,
  "target_utilization": 0.7
}

If the current server usage is 80%, what will likely happen?

medium

A. The system will scale up servers to reduce load

B. The system will scale down servers to save cost

C. The system will keep the same number of servers

D. The system will shut down all servers

Solution

Step 1: Compare current usage to target utilization
The current usage (80%) is higher than the target utilization (70%).
Step 2: Determine scaling action
Since usage is above target, the system will add servers (scale up) to reduce load and meet the target.
Final Answer:
The system will scale up servers to reduce load -> Option A
Quick Check:
Usage > target = scale up [OK]

Hint: If usage > target, scale up servers [OK]

Common Mistakes:

Scaling down when usage is above target
Assuming no change if usage is slightly above target
Thinking system shuts down servers automatically

4. You configured an auto-scaling endpoint with min_servers: 1 and max_servers: 3. The system never scales above 1 server even under high load. What is the most likely cause?

medium

A. The max_servers is set too low to allow scaling

B. The target utilization is set too high, preventing scale up

C. The min_servers value is incorrectly set to 3

D. The system does not support auto-scaling

Solution

Step 1: Analyze scaling limits
Min servers is 1 and max servers is 3, so scaling up to 3 is allowed.
Step 2: Check target utilization impact
If target utilization is set very high (e.g., 90%+), the system thinks current load is acceptable and won't scale up.
Final Answer:
The target utilization is set too high, preventing scale up -> Option B
Quick Check:
High target utilization blocks scaling up [OK]

Hint: High target utilization can block scaling up [OK]

Common Mistakes:

Confusing max_servers as too low when it allows scaling
Misreading min_servers as max_servers
Assuming system lacks auto-scaling support

5. You want to configure an auto-scaling inference endpoint that never drops below 2 servers, never exceeds 6 servers, and aims to keep CPU usage around 60%. Which configuration is correct?

hard

A. { "min_servers": 2, "max_servers": 6, "target_utilization": 0.9 }

B. { "min_servers": 6, "max_servers": 2, "target_utilization": 0.6 }

C. { "min_servers": 2, "max_servers": 6, "target_utilization": 0.6 }

D. { "min_servers": 1, "max_servers": 6, "target_utilization": 0.6 }

Solution

Step 1: Set minimum and maximum servers correctly
Minimum servers should be 2 and maximum servers 6, so min_servers: 2 and max_servers: 6 are correct.
Step 2: Set target utilization to 60%
Target utilization should be 0.6 (60%) to keep CPU usage around that level.
Step 3: Verify options
{ "min_servers": 2, "max_servers": 6, "target_utilization": 0.6 } matches all requirements. { "min_servers": 6, "max_servers": 2, "target_utilization": 0.6 } reverses min and max servers. { "min_servers": 2, "max_servers": 6, "target_utilization": 0.9 } has wrong target utilization. { "min_servers": 1, "max_servers": 6, "target_utilization": 0.6 } has min_servers as 1, which is below requirement.
Final Answer:
{ "min_servers": 2, "max_servers": 6, "target_utilization": 0.6 } -> Option C
Quick Check:
Correct min, max, and target utilization = { "min_servers": 2, "max_servers": 6, "target_utilization": 0.6 } [OK]

Hint: Min ≤ max and target_utilization as decimal (0.6) [OK]

Common Mistakes:

Swapping min_servers and max_servers values
Using target_utilization as percentage (60) instead of decimal (0.6)
Setting min_servers lower than required

Why Auto-scaling inference endpoints in MLOps? - Purpose & Use Cases

Start learning this pattern below

Practice

Solution

Step 1: Understand auto-scaling concept

Step 2: Identify the purpose in ML inference

Final Answer:

Quick Check:

Solution

Step 1: Identify minimum server setting

Step 2: Differentiate from other settings

Final Answer:

Quick Check:

Solution

Step 1: Compare current usage to target utilization

Step 2: Determine scaling action

Final Answer:

Quick Check:

Solution

Step 1: Analyze scaling limits

Step 2: Check target utilization impact

Final Answer:

Quick Check:

Solution

Step 1: Set minimum and maximum servers correctly

Step 2: Set target utilization to 60%

Step 3: Verify options

Final Answer:

Quick Check: