Auto-scaling Inference Endpoints
📖 Scenario: You work at a company that provides machine learning predictions through an API. The API needs to handle different amounts of user requests at different times. To save money and keep the service fast, you want to automatically adjust the number of servers running the prediction model based on the current request load.
🎯 Goal: Build a simple Python program that simulates auto-scaling of inference endpoints. You will create a data structure to hold current server loads, set a threshold for scaling, write logic to decide when to add or remove servers, and finally display the updated server count.
📋 What You'll Learn
Create a dictionary called
server_loads with exact keys 'server1', 'server2', and 'server3' and their loads as integers: 30, 55, and 20 respectively.Create a variable called
scale_threshold and set it to 50.Write a
for loop using variables server and load to iterate over server_loads.items() and count how many servers have load greater than scale_threshold.Print the number of servers that need scaling with the exact text format:
"Servers to scale: X" where X is the count.💡 Why This Matters
🌍 Real World
Auto-scaling inference endpoints help cloud services save money and keep response times fast by adjusting resources based on demand.
💼 Career
Understanding auto-scaling is important for DevOps and MLOps roles that manage machine learning services in production.
Progress0 / 4 steps