Recall & Review
beginner
What is load balancing in AI services?
Load balancing is the process of distributing incoming AI service requests evenly across multiple servers or resources to ensure no single server is overwhelmed, improving speed and reliability.
Click to reveal answer
beginner
Why is load balancing important for AI services?
It helps keep AI services fast and available by preventing any one server from getting too busy, which can cause delays or crashes.
Click to reveal answer
intermediate
Name two common load balancing methods used in AI services.
Round Robin (requests go to servers in order) and Least Connections (requests go to the server with the fewest active connections).
Click to reveal answer
intermediate
How does load balancing improve fault tolerance in AI services?
If one server fails, load balancers redirect requests to other healthy servers, keeping the AI service running smoothly.
Click to reveal answer
intermediate
What role does health checking play in load balancing for AI services?
Health checks monitor servers to ensure they are working well; load balancers use this info to avoid sending requests to servers that are down or slow.
Click to reveal answer
What does load balancing do in AI services?
✗ Incorrect
Load balancing spreads requests evenly to keep AI services fast and reliable.
Which load balancing method sends requests to the server with the fewest active connections?
✗ Incorrect
Least Connections sends requests to the server with the fewest active connections to balance load.
How does load balancing help if one AI server crashes?
✗ Incorrect
Load balancers redirect requests to healthy servers to keep the service running.
What is a health check in load balancing?
✗ Incorrect
Health checks monitor server status to avoid sending requests to bad servers.
Which of these is NOT a benefit of load balancing for AI services?
✗ Incorrect
Load balancing reduces crashes by spreading load, not increasing them.
Explain in your own words why load balancing is important for AI services.
Think about what happens if one server gets too many requests.
You got /4 concepts.
Describe two common methods of load balancing and how they decide where to send requests.
One method cycles through servers, the other checks how busy servers are.
You got /3 concepts.