Bird
Raised Fist0
Prompt Engineering / GenAIml~20 mins

Load balancing for AI services in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Load Balancing Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
What is the main purpose of load balancing in AI services?

Load balancing helps AI services handle many requests efficiently. What is its main goal?

ATo reduce the accuracy of AI predictions to speed up processing
BTo increase the size of the AI model automatically during high demand
CTo evenly distribute incoming requests across multiple AI servers to prevent overload
DTo store all AI data in a single server for faster access
Attempts:
2 left
💡 Hint

Think about how to avoid one server getting too busy while others are idle.

Predict Output
intermediate
1:30remaining
What output does this load balancing code produce?

Consider this Python code simulating round-robin load balancing for AI requests:

servers = ['AI1', 'AI2', 'AI3']
requests = 5
assignments = []
for i in range(requests):
    server = servers[i % len(servers)]
    assignments.append(server)
print(assignments)

What is the printed output?

Prompt Engineering / GenAI
servers = ['AI1', 'AI2', 'AI3']
requests = 5
assignments = []
for i in range(requests):
    server = servers[i % len(servers)]
    assignments.append(server)
print(assignments)
A['AI1', 'AI1', 'AI1', 'AI1', 'AI1']
B['AI3', 'AI2', 'AI1', 'AI3', 'AI2']
C['AI2', 'AI3', 'AI1', 'AI2', 'AI3']
D['AI1', 'AI2', 'AI3', 'AI1', 'AI2']
Attempts:
2 left
💡 Hint

Look at how the modulo operator cycles through the server list.

Hyperparameter
advanced
2:00remaining
Which hyperparameter affects load balancing efficiency in AI model serving?

When deploying AI models behind a load balancer, which hyperparameter most directly impacts how well the load is balanced?

ABatch size of requests processed by each AI server
BLearning rate of the AI model during training
CNumber of layers in the AI model architecture
DDropout rate used in the AI model
Attempts:
2 left
💡 Hint

Think about what controls how many requests a server handles at once.

Metrics
advanced
1:30remaining
Which metric best indicates load balancing effectiveness in AI services?

You monitor AI servers behind a load balancer. Which metric best shows if load balancing is working well?

ATotal number of AI model parameters
BVariance in CPU usage across all AI servers
CTraining accuracy of the AI model
DSize of the AI model file on disk
Attempts:
2 left
💡 Hint

Good load balancing means servers work evenly. What shows uneven work?

🔧 Debug
expert
2:30remaining
Why does this AI load balancer code cause uneven request distribution?

Review this Python code snippet for load balancing AI requests:

servers = ['AI1', 'AI2', 'AI3']
requests = 6
assignments = []
for i in range(requests):
    server = servers[(i // 2) % len(servers)]
    assignments.append(server)
print(assignments)

Why does this code cause uneven load distribution?

Prompt Engineering / GenAI
servers = ['AI1', 'AI2', 'AI3']
requests = 6
assignments = []
for i in range(requests):
    server = servers[(i // 2) % len(servers)]
    assignments.append(server)
print(assignments)
ABecause each server gets two consecutive requests before switching, causing bursts
BBecause the modulo operator is used incorrectly causing index errors
CBecause the loop runs fewer times than the number of requests
DBecause the servers list is empty, causing an error
Attempts:
2 left
💡 Hint

Look at how the index changes with integer division.

Practice

(1/5)
1. What is the main purpose of load balancing in AI services?
easy
A. To spread AI requests across multiple servers to keep response times fast
B. To increase the size of AI models automatically
C. To reduce the number of AI users at the same time
D. To store AI data in a single location

Solution

  1. Step 1: Understand load balancing role

    Load balancing distributes incoming AI requests to multiple servers to avoid overload on one server.
  2. Step 2: Identify the benefit

    This spreading keeps the AI service fast and responsive even when many users access it simultaneously.
  3. Final Answer:

    To spread AI requests across multiple servers to keep response times fast -> Option A
  4. Quick Check:

    Load balancing = spreading requests fast response [OK]
Hint: Load balancing means sharing work across servers [OK]
Common Mistakes:
  • Thinking load balancing increases model size
  • Believing it reduces user numbers
  • Assuming it stores data in one place
2. Which of the following is a correct simple load balancing method for AI requests?
easy
A. Round-robin, where requests go to servers in order one by one
B. Randomly deleting requests to reduce load
C. Sending all requests to the first server only
D. Increasing request size to slow down processing

Solution

  1. Step 1: Identify simple load balancing methods

    Round-robin sends requests to each server in turn, balancing load evenly.
  2. Step 2: Check other options

    Deleting requests or sending all to one server causes problems, and increasing request size slows service.
  3. Final Answer:

    Round-robin, where requests go to servers in order one by one -> Option A
  4. Quick Check:

    Round-robin = simple balanced request distribution [OK]
Hint: Round-robin cycles through servers evenly [OK]
Common Mistakes:
  • Thinking deleting requests helps load balancing
  • Sending all requests to one server
  • Confusing load balancing with slowing requests
3. Consider this Python code simulating load balancing with round-robin over 3 servers:
servers = ['S1', 'S2', 'S3']
requests = 5
for i in range(requests):
    server = servers[i % len(servers)]
    print(f'Request {i+1} sent to {server}')
What is the output for Request 4?
medium
A. Request 4 sent to S3
B. Request 4 sent to S1
C. Request 4 sent to S2
D. Request 4 sent to S4

Solution

  1. Step 1: Understand the round-robin index calculation

    For request 4 (i=3), server index = 3 % 3 = 0, so server = 'S1'. But check carefully the code output.
  2. Step 2: Check the printed output for request 4

    Request numbering starts at 1, so Request 4 corresponds to i=3, server = servers[3 % 3] = servers[0] = 'S1'. So output is 'Request 4 sent to S1'.
  3. Final Answer:

    Request 4 sent to S1 -> Option B
  4. Quick Check:

    Index 3 % 3 = 0, server S1 [OK]
Hint: Use modulo (%) to cycle server index [OK]
Common Mistakes:
  • Off-by-one error in indexing servers
  • Confusing request number with index
  • Assuming server S4 exists
4. The following code tries to balance AI requests but has a bug:
servers = ['A', 'B']
requests = ['req1', 'req2', 'req3', 'req4', 'req5']
for i in range(len(requests)):
    server = servers[i // len(servers)]
    print(f'{requests[i]} sent to {server}')
What is the error?
medium
A. The print statement syntax is wrong
B. The servers list is empty
C. Requests list is empty
D. Using integer division (//) instead of modulo (%) causes index error

Solution

  1. Step 1: Analyze the index calculation for server selection

    The code uses i // len(servers) which is integer division, so for i=2 and len(servers)=2, index = 1, which is valid, but for larger i it can go out of range.
  2. Step 2: Identify correct operator for cycling

    Modulo (%) should be used to cycle through server indices repeatedly, not integer division.
  3. Final Answer:

    Using integer division (//) instead of modulo (%) causes index error -> Option D
  4. Quick Check:

    Use % to cycle indices, not // [OK]
Hint: Use % for cycling indices, not // [OK]
Common Mistakes:
  • Confusing // with %
  • Assuming empty lists cause error here
  • Thinking print syntax is wrong
5. You manage an AI service with 4 servers. During peak hours, requests spike to 1000 per minute. Which load balancing strategy best ensures fast responses and avoids server overload?
hard
A. Send all requests to the fastest server only
B. Randomly drop 50% of requests to reduce load
C. Use round-robin to evenly distribute requests across all servers
D. Assign requests only to the first two servers

Solution

  1. Step 1: Understand the problem of request spikes

    High request volume can overload servers if not balanced well, causing slow responses or failures.
  2. Step 2: Evaluate load balancing options

    Round-robin evenly spreads requests, preventing overload. Sending all to one server or only two servers risks overload. Dropping requests reduces service quality.
  3. Final Answer:

    Use round-robin to evenly distribute requests across all servers -> Option C
  4. Quick Check:

    Round-robin = balanced load, fast response [OK]
Hint: Spread requests evenly to avoid overload [OK]
Common Mistakes:
  • Overloading one or two servers
  • Dropping requests unnecessarily
  • Ignoring load balancing benefits