Bird
Raised Fist0
Prompt Engineering / GenAIml~10 mins

Load balancing for AI services in Prompt Engineering / GenAI - Interactive Code Practice

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create a simple round-robin load balancer that selects the next AI service endpoint.

Prompt Engineering / GenAI
def get_next_endpoint(endpoints, current_index):
    return endpoints[[1]]  # Select next endpoint
Drag options to blanks, or click blank then click option'
Acurrent_index + 1 % len(endpoints)
Bcurrent_index - 1 % len(endpoints)
C(current_index + 1) % len(endpoints)
Dcurrent_index % len(endpoints)
Attempts:
3 left
💡 Hint
Common Mistakes
Forgetting parentheses causing wrong order of operations.
Not using modulo leading to index out of range.
2fill in blank
medium

Complete the code to check if an AI service endpoint is healthy before sending a request.

Prompt Engineering / GenAI
def is_healthy(endpoint):
    response = send_health_check_request(endpoint)
    return response.status_code == [1]
Drag options to blanks, or click blank then click option'
A200
B404
C500
D302
Attempts:
3 left
💡 Hint
Common Mistakes
Using error codes like 404 or 500 instead of 200.
Confusing redirect codes like 302 with success.
3fill in blank
hard

Fix the error in the code that distributes requests evenly across AI service endpoints using modulo.

Prompt Engineering / GenAI
def distribute_request(request_id, endpoints):
    index = request_id [1] len(endpoints)
    return endpoints[index]
Drag options to blanks, or click blank then click option'
A%
B/
C*
D-
Attempts:
3 left
💡 Hint
Common Mistakes
Using division (/) instead of modulo (%).
Using multiplication (*) or subtraction (-) which do not wrap indices.
4fill in blank
hard

Fill both blanks to create a dictionary comprehension that maps each AI service endpoint to its current load if the load is below 70%.

Prompt Engineering / GenAI
loads = {endpoint: [1] for endpoint, load in endpoint_loads.items() if load [2] 70}
Drag options to blanks, or click blank then click option'
Aload
B>
C<
Dendpoint
Attempts:
3 left
💡 Hint
Common Mistakes
Using '>' instead of '<' causing wrong filtering.
Mapping to 'endpoint' instead of 'load'.
5fill in blank
hard

Fill all three blanks to create a function that selects the AI service endpoint with the lowest load from a dictionary.

Prompt Engineering / GenAI
def select_least_loaded(endpoint_loads):
    return min(endpoint_loads, key=lambda [1]: endpoint_loads[[2]]) if endpoint_loads else [3]
Drag options to blanks, or click blank then click option'
Aendpoint
CNone
Dload
Attempts:
3 left
💡 Hint
Common Mistakes
Using wrong variable names in lambda.
Not handling empty dictionary case.

Practice

(1/5)
1. What is the main purpose of load balancing in AI services?
easy
A. To spread AI requests across multiple servers to keep response times fast
B. To increase the size of AI models automatically
C. To reduce the number of AI users at the same time
D. To store AI data in a single location

Solution

  1. Step 1: Understand load balancing role

    Load balancing distributes incoming AI requests to multiple servers to avoid overload on one server.
  2. Step 2: Identify the benefit

    This spreading keeps the AI service fast and responsive even when many users access it simultaneously.
  3. Final Answer:

    To spread AI requests across multiple servers to keep response times fast -> Option A
  4. Quick Check:

    Load balancing = spreading requests fast response [OK]
Hint: Load balancing means sharing work across servers [OK]
Common Mistakes:
  • Thinking load balancing increases model size
  • Believing it reduces user numbers
  • Assuming it stores data in one place
2. Which of the following is a correct simple load balancing method for AI requests?
easy
A. Round-robin, where requests go to servers in order one by one
B. Randomly deleting requests to reduce load
C. Sending all requests to the first server only
D. Increasing request size to slow down processing

Solution

  1. Step 1: Identify simple load balancing methods

    Round-robin sends requests to each server in turn, balancing load evenly.
  2. Step 2: Check other options

    Deleting requests or sending all to one server causes problems, and increasing request size slows service.
  3. Final Answer:

    Round-robin, where requests go to servers in order one by one -> Option A
  4. Quick Check:

    Round-robin = simple balanced request distribution [OK]
Hint: Round-robin cycles through servers evenly [OK]
Common Mistakes:
  • Thinking deleting requests helps load balancing
  • Sending all requests to one server
  • Confusing load balancing with slowing requests
3. Consider this Python code simulating load balancing with round-robin over 3 servers:
servers = ['S1', 'S2', 'S3']
requests = 5
for i in range(requests):
    server = servers[i % len(servers)]
    print(f'Request {i+1} sent to {server}')
What is the output for Request 4?
medium
A. Request 4 sent to S3
B. Request 4 sent to S1
C. Request 4 sent to S2
D. Request 4 sent to S4

Solution

  1. Step 1: Understand the round-robin index calculation

    For request 4 (i=3), server index = 3 % 3 = 0, so server = 'S1'. But check carefully the code output.
  2. Step 2: Check the printed output for request 4

    Request numbering starts at 1, so Request 4 corresponds to i=3, server = servers[3 % 3] = servers[0] = 'S1'. So output is 'Request 4 sent to S1'.
  3. Final Answer:

    Request 4 sent to S1 -> Option B
  4. Quick Check:

    Index 3 % 3 = 0, server S1 [OK]
Hint: Use modulo (%) to cycle server index [OK]
Common Mistakes:
  • Off-by-one error in indexing servers
  • Confusing request number with index
  • Assuming server S4 exists
4. The following code tries to balance AI requests but has a bug:
servers = ['A', 'B']
requests = ['req1', 'req2', 'req3', 'req4', 'req5']
for i in range(len(requests)):
    server = servers[i // len(servers)]
    print(f'{requests[i]} sent to {server}')
What is the error?
medium
A. The print statement syntax is wrong
B. The servers list is empty
C. Requests list is empty
D. Using integer division (//) instead of modulo (%) causes index error

Solution

  1. Step 1: Analyze the index calculation for server selection

    The code uses i // len(servers) which is integer division, so for i=2 and len(servers)=2, index = 1, which is valid, but for larger i it can go out of range.
  2. Step 2: Identify correct operator for cycling

    Modulo (%) should be used to cycle through server indices repeatedly, not integer division.
  3. Final Answer:

    Using integer division (//) instead of modulo (%) causes index error -> Option D
  4. Quick Check:

    Use % to cycle indices, not // [OK]
Hint: Use % for cycling indices, not // [OK]
Common Mistakes:
  • Confusing // with %
  • Assuming empty lists cause error here
  • Thinking print syntax is wrong
5. You manage an AI service with 4 servers. During peak hours, requests spike to 1000 per minute. Which load balancing strategy best ensures fast responses and avoids server overload?
hard
A. Send all requests to the fastest server only
B. Randomly drop 50% of requests to reduce load
C. Use round-robin to evenly distribute requests across all servers
D. Assign requests only to the first two servers

Solution

  1. Step 1: Understand the problem of request spikes

    High request volume can overload servers if not balanced well, causing slow responses or failures.
  2. Step 2: Evaluate load balancing options

    Round-robin evenly spreads requests, preventing overload. Sending all to one server or only two servers risks overload. Dropping requests reduces service quality.
  3. Final Answer:

    Use round-robin to evenly distribute requests across all servers -> Option C
  4. Quick Check:

    Round-robin = balanced load, fast response [OK]
Hint: Spread requests evenly to avoid overload [OK]
Common Mistakes:
  • Overloading one or two servers
  • Dropping requests unnecessarily
  • Ignoring load balancing benefits