0
0
MLOpsdevops~30 mins

Auto-scaling inference endpoints in MLOps - Mini Project: Build & Apply

Choose your learning style9 modes available
Auto-scaling Inference Endpoints
📖 Scenario: You work at a company that provides machine learning predictions through an API. The API needs to handle different amounts of user requests at different times. To save money and keep the service fast, you want to automatically adjust the number of servers running the prediction model based on the current request load.
🎯 Goal: Build a simple Python program that simulates auto-scaling of inference endpoints. You will create a data structure to hold current server loads, set a threshold for scaling, write logic to decide when to add or remove servers, and finally display the updated server count.
📋 What You'll Learn
Create a dictionary called server_loads with exact keys 'server1', 'server2', and 'server3' and their loads as integers: 30, 55, and 20 respectively.
Create a variable called scale_threshold and set it to 50.
Write a for loop using variables server and load to iterate over server_loads.items() and count how many servers have load greater than scale_threshold.
Print the number of servers that need scaling with the exact text format: "Servers to scale: X" where X is the count.
💡 Why This Matters
🌍 Real World
Auto-scaling inference endpoints help cloud services save money and keep response times fast by adjusting resources based on demand.
💼 Career
Understanding auto-scaling is important for DevOps and MLOps roles that manage machine learning services in production.
Progress0 / 4 steps
1
Create the initial server load data
Create a dictionary called server_loads with these exact entries: 'server1': 30, 'server2': 55, and 'server3': 20.
MLOps
Need a hint?

Use curly braces {} to create a dictionary. Put keys in quotes and values as numbers.

2
Set the scaling threshold
Create a variable called scale_threshold and set it to the integer 50.
MLOps
Need a hint?

Just assign the number 50 to the variable scale_threshold.

3
Count servers exceeding the threshold
Write a for loop using variables server and load to iterate over server_loads.items(). Inside the loop, count how many servers have load greater than scale_threshold. Store the count in a variable called servers_to_scale.
MLOps
Need a hint?

Start with servers_to_scale = 0. Use a for loop with server, load over server_loads.items(). Use an if to check if load > scale_threshold and increase the count.

4
Display the number of servers to scale
Write a print statement to display the number of servers to scale with the exact text format: "Servers to scale: X" where X is the value of servers_to_scale.
MLOps
Need a hint?

Use an f-string in the print statement to include the variable servers_to_scale inside the text.