Load balancing helps AI services handle many requests efficiently. What is its main goal?
Think about how to avoid one server getting too busy while others are idle.
Load balancing spreads requests evenly to avoid any single server becoming a bottleneck, ensuring smooth and fast AI service.
Consider this Python code simulating round-robin load balancing for AI requests:
servers = ['AI1', 'AI2', 'AI3']
requests = 5
assignments = []
for i in range(requests):
server = servers[i % len(servers)]
assignments.append(server)
print(assignments)What is the printed output?
servers = ['AI1', 'AI2', 'AI3'] requests = 5 assignments = [] for i in range(requests): server = servers[i % len(servers)] assignments.append(server) print(assignments)
Look at how the modulo operator cycles through the server list.
The modulo (%) operator cycles through the servers list indices 0,1,2 repeatedly, assigning requests in order.
When deploying AI models behind a load balancer, which hyperparameter most directly impacts how well the load is balanced?
Think about what controls how many requests a server handles at once.
Batch size controls how many requests a server processes together, affecting throughput and load distribution efficiency.
You monitor AI servers behind a load balancer. Which metric best shows if load balancing is working well?
Good load balancing means servers work evenly. What shows uneven work?
Variance in CPU usage shows how evenly the load is spread. Low variance means balanced load.
Review this Python code snippet for load balancing AI requests:
servers = ['AI1', 'AI2', 'AI3']
requests = 6
assignments = []
for i in range(requests):
server = servers[(i // 2) % len(servers)]
assignments.append(server)
print(assignments)Why does this code cause uneven load distribution?
servers = ['AI1', 'AI2', 'AI3'] requests = 6 assignments = [] for i in range(requests): server = servers[(i // 2) % len(servers)] assignments.append(server) print(assignments)
Look at how the index changes with integer division.
The integer division (i // 2) groups requests in pairs, so each server gets two requests in a row, causing uneven bursts.