0
0
Microservicessystem_design~7 mins

Routing and load balancing in Microservices - System Design Guide

Choose your learning style9 modes available
Problem Statement
When all client requests go to a single server or service instance, that instance becomes a bottleneck causing slow responses and potential crashes. Without distributing requests properly, some servers may be overloaded while others remain idle, leading to inefficient resource use and poor user experience.
Solution
Routing and load balancing distribute incoming requests across multiple service instances or servers to spread the workload evenly. Routing directs requests based on rules like user location or service version, while load balancing ensures no single instance is overwhelmed by monitoring load and forwarding requests accordingly.
Architecture
Clients
Clients
Load Balancer
Load Balancer
Service A

This diagram shows clients sending requests to a load balancer, which routes requests to multiple service instances to balance the load.

Trade-offs
✓ Pros
Prevents any single server from becoming a bottleneck by distributing requests evenly.
Improves system availability by rerouting traffic if a server fails.
Enables scaling out by adding more service instances behind the load balancer.
Supports routing rules for directing traffic based on criteria like user location or version.
✗ Cons
Adds an extra network hop which can introduce slight latency.
Load balancer itself can become a single point of failure if not highly available.
Complexity increases with advanced routing rules and health checks.
Use when your system has multiple service instances handling requests and you expect traffic above 1000 requests per second or need high availability and fault tolerance.
Avoid if your system has only one instance or very low traffic (under 100 requests per second) where the overhead of load balancing outweighs benefits.
Real World Examples
Netflix
Uses load balancers to distribute streaming requests across edge servers, ensuring smooth playback and failover.
Uber
Routes ride requests to the nearest available driver service instance using geographic routing combined with load balancing.
Amazon
Balances API requests across multiple backend services to maintain low latency and high availability during peak shopping times.
Code Example
The before code calls a fixed service instance, risking overload and failure. The after code cycles through multiple instances, distributing requests evenly to balance load.
Microservices
import requests

### Before: No load balancing, direct calls to fixed service instance
class ServiceClient:
    def get_data(self):
        # Always calls the same service instance
        response = requests.get('http://service-instance-1/api/data')
        return response.json()


### After: Simple round-robin load balancer client
class LoadBalancerClient:
    def __init__(self):
        self.instances = [
            'http://service-instance-1/api/data',
            'http://service-instance-2/api/data',
            'http://service-instance-3/api/data'
        ]
        self.index = 0

    def get_data(self):
        url = self.instances[self.index]
        self.index = (self.index + 1) % len(self.instances)
        response = requests.get(url)
        return response.json()
OutputSuccess
Alternatives
Client-side load balancing
Clients decide which service instance to call, removing the need for a centralized load balancer.
Use when: Choose when you want to reduce infrastructure complexity and clients can maintain service instance lists.
DNS-based load balancing
Uses DNS to distribute requests by returning different IP addresses for the same domain.
Use when: Choose when you need simple load distribution without real-time health checks or fine-grained routing.
Summary
Routing and load balancing prevent server overload by distributing client requests across multiple service instances.
They improve system availability, scalability, and fault tolerance by directing traffic intelligently.
Choosing the right load balancing strategy depends on traffic volume, system complexity, and failure tolerance needs.