Microservicessystem_design~7 mins

Routing and load balancing in Microservices - System Design Guide

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Problem Statement

When all client requests go to a single server or service instance, that instance becomes a bottleneck causing slow responses and potential crashes. Without distributing requests properly, some servers may be overloaded while others remain idle, leading to inefficient resource use and poor user experience.

Solution

Routing and load balancing distribute incoming requests across multiple service instances or servers to spread the workload evenly. Routing directs requests based on rules like user location or service version, while load balancing ensures no single instance is overwhelmed by monitoring load and forwarding requests accordingly.

Architecture

Clients

↓

Load Balancer

↓

Service A

This diagram shows clients sending requests to a load balancer, which routes requests to multiple service instances to balance the load.

Trade-offs

✓ Pros

→

Prevents any single server from becoming a bottleneck by distributing requests evenly.

→

Improves system availability by rerouting traffic if a server fails.

→

Enables scaling out by adding more service instances behind the load balancer.

→

Supports routing rules for directing traffic based on criteria like user location or version.

✗ Cons

→

Adds an extra network hop which can introduce slight latency.

→

Load balancer itself can become a single point of failure if not highly available.

→

Complexity increases with advanced routing rules and health checks.

Use when your system has multiple service instances handling requests and you expect traffic above 1000 requests per second or need high availability and fault tolerance.

Avoid if your system has only one instance or very low traffic (under 100 requests per second) where the overhead of load balancing outweighs benefits.

Real World Examples

Netflix

Uses load balancers to distribute streaming requests across edge servers, ensuring smooth playback and failover.

Uber

Routes ride requests to the nearest available driver service instance using geographic routing combined with load balancing.

Amazon

Balances API requests across multiple backend services to maintain low latency and high availability during peak shopping times.

Code Example

The before code calls a fixed service instance, risking overload and failure. The after code cycles through multiple instances, distributing requests evenly to balance load.

Microservices

import requests

### Before: No load balancing, direct calls to fixed service instance
class ServiceClient:
    def get_data(self):
        # Always calls the same service instance
        response = requests.get('http://service-instance-1/api/data')
        return response.json()


### After: Simple round-robin load balancer client
class LoadBalancerClient:
    def __init__(self):
        self.instances = [
            'http://service-instance-1/api/data',
            'http://service-instance-2/api/data',
            'http://service-instance-3/api/data'
        ]
        self.index = 0

    def get_data(self):
        url = self.instances[self.index]
        self.index = (self.index + 1) % len(self.instances)
        response = requests.get(url)
        return response.json()

OutputSuccess

Alternatives

Client-side load balancing

Clients decide which service instance to call, removing the need for a centralized load balancer.

Use when: Choose when you want to reduce infrastructure complexity and clients can maintain service instance lists.

DNS-based load balancing

Uses DNS to distribute requests by returning different IP addresses for the same domain.

Use when: Choose when you need simple load distribution without real-time health checks or fine-grained routing.

Summary

Routing and load balancing prevent server overload by distributing client requests across multiple service instances.

They improve system availability, scalability, and fault tolerance by directing traffic intelligently.

Choosing the right load balancing strategy depends on traffic volume, system complexity, and failure tolerance needs.

Practice

(1/5)

1. What is the main purpose of routing in a microservices architecture?

easy

A. To store data persistently across services

B. To monitor the health of microservices

C. To encrypt communication between services

D. To send requests to the correct microservice based on rules

Routing and load balancing in Microservices - System Design Guide

Start learning this pattern below

Practice

Solution

Step 1: Understand routing role

Step 2: Differentiate routing from other functions

Final Answer:

Quick Check:

Solution

Step 1: Identify common load balancing syntax

Step 2: Evaluate options for correct syntax style

Final Answer:

Quick Check:

Solution

Step 1: Understand weighted routing concept

Step 2: Calculate expected requests for serviceA

Final Answer:

Quick Check:

Solution

Step 1: Analyze health check integration

Step 2: Evaluate other options for relevance

Final Answer:

Quick Check:

Solution

Step 1: Identify routing needs for user requests and jobs

Step 2: Choose architecture supporting both routing and load balancing rules

Final Answer:

Quick Check: