| Users | Requests per Second (RPS) | Routing Complexity | Load Balancer Setup | Network Traffic |
|---|---|---|---|---|
| 100 users | ~50 RPS | Simple routing, single load balancer | One load balancer instance | Low, easily handled by single server |
| 10,000 users | ~5,000 RPS | Multiple microservices, routing rules grow | Multiple load balancers with health checks | Moderate, requires monitoring |
| 1,000,000 users | ~500,000 RPS | Complex routing, service discovery needed | Distributed load balancers, global traffic management | High, needs optimized network and CDN |
| 100,000,000 users | ~50,000,000 RPS | Highly dynamic routing, multi-region failover | Hierarchical load balancing, edge routing, global DNS | Very high, requires advanced network infra |
Routing and load balancing in Microservices - Scalability & System Analysis
Start learning this pattern below
Jump into concepts and practice - no test required
At small scale, the load balancer server CPU and memory become the first bottleneck because it must handle all incoming requests and route them correctly. As traffic grows, the routing logic and service discovery can slow down, causing delays. At medium scale, network bandwidth and latency between load balancers and microservices become critical. At large scale, global routing and failover complexity cause bottlenecks if not properly distributed.
- Horizontal Scaling: Add more load balancer instances behind a DNS or anycast IP to distribute traffic.
- Service Discovery: Use dynamic service registries to keep routing updated without manual config.
- Caching: Cache routing decisions or DNS lookups to reduce latency.
- Sharding: Partition traffic by user region or service type to reduce load per balancer.
- CDN and Edge Routing: Offload static content and route users to nearest data center.
- Global Load Balancing: Use DNS-based or geo-aware load balancing for multi-region failover.
Assuming 1 million users generate ~500,000 RPS:
- Each load balancer can handle ~5,000 concurrent connections and ~10,000 RPS.
- Number of load balancers needed: 500,000 / 10,000 = 50 instances minimum.
- Network bandwidth: If average request size is 10 KB, total bandwidth = 500,000 * 10 KB = ~5 GB/s (~40 Gbps).
- Storage is minimal for routing but logs and metrics storage grows with traffic.
Start by explaining the role of routing and load balancing in microservices. Discuss how traffic grows and what breaks first. Then, describe scaling strategies step-by-step: horizontal scaling, service discovery, caching, and global load balancing. Use real numbers to show understanding of capacity and bottlenecks.
Your database handles 1000 QPS. Traffic grows 10x. What do you do first?
Answer: Since the database is the bottleneck at 1000 QPS, and traffic grows to 10,000 QPS, the first step is to add read replicas and implement caching to reduce direct database load before scaling application servers or load balancers.
Practice
Solution
Step 1: Understand routing role
Routing directs incoming requests to the right microservice based on predefined rules like URL paths or headers.Step 2: Differentiate routing from other functions
Storing data, encrypting communication, and monitoring are separate concerns handled by databases, security layers, and monitoring tools respectively.Final Answer:
To send requests to the correct microservice based on rules -> Option DQuick Check:
Routing = directing requests [OK]
- Confusing routing with data storage
- Mixing routing with security or monitoring
- Thinking routing balances load
Solution
Step 1: Identify common load balancing syntax
Round robin is a standard load balancing method cycling through instances evenly, often expressed as a list.Step 2: Evaluate options for correct syntax style
round_robin: [instance1, instance2, instance3] uses a clear list with round_robin keyword, matching common config styles. Others use invalid or uncommon syntax.Final Answer:
round_robin: [instance1, instance2, instance3] -> Option AQuick Check:
Round robin uses list syntax [OK]
- Using semicolons instead of commas
- Incorrect assignment operators
- Using arrows or pipes incorrectly
weights = {"serviceA": 3, "serviceB": 1}
requests = 8
for i in range(requests):
target = weighted_choice(weights)
print(target)
What is the expected number of requests routed to serviceA?Solution
Step 1: Understand weighted routing concept
Weights define how many times a service should receive requests relative to others. ServiceA has weight 3, serviceB has weight 1, total weight is 4.Step 2: Calculate expected requests for serviceA
Out of 8 requests, serviceA should get (3/4)*8 = 6 requests on average.Final Answer:
6 -> Option AQuick Check:
Weighted share = 6 requests [OK]
- Ignoring weights and dividing requests equally
- Confusing total weight with individual weights
- Calculating requests for serviceB instead
if (instance.isHealthy()) {
forwardRequest(instance)
} else {
skipInstance(instance)
}
However, requests are still being sent to unhealthy instances. What is the most likely cause?Solution
Step 1: Analyze health check integration
The code shows a health check condition, but if the load balancer does not actually use this logic, unhealthy instances may still receive traffic.Step 2: Evaluate other options for relevance
Round robin vs weighted routing does not affect health checks. Overload does not mark instances unhealthy. URL path matching is unrelated to health status.Final Answer:
Health check logic is not integrated with the load balancer -> Option BQuick Check:
Health check integration = key [OK]
- Assuming routing method affects health checks
- Confusing overload with health status
- Ignoring missing integration of health logic
Solution
Step 1: Identify routing needs for user requests and jobs
User requests require path-based routing to separate them from background jobs, which need different load balancing strategies.Step 2: Choose architecture supporting both routing and load balancing rules
A single load balancer with path-based routing can direct traffic to two target groups. One group uses round robin for user requests, the other weighted for jobs, meeting all requirements efficiently.Final Answer:
Use a single load balancer with path-based routing directing to two target groups; one uses round robin, the other weighted balancing -> Option CQuick Check:
Path-based routing + mixed balancing = Use a single load balancer with path-based routing directing to two target groups; one uses round robin, the other weighted balancing [OK]
- Using weighted balancing for user requests instead of round robin
- Splitting with DNS which lacks path awareness
- Routing all traffic to one instance causing bottlenecks
