Overview - Why load balancing matters

What is it?

Load balancing is a way to spread work evenly across many computers or servers. It helps make sure no single server gets too busy while others are idle. This keeps websites and apps running smoothly and quickly. Load balancing also helps keep services available even if some servers fail.

Why it matters

Without load balancing, some servers would get overwhelmed and slow down or crash, making websites or apps hard to use or unavailable. This can frustrate users and cause lost business. Load balancing ensures a fair share of work, improving speed and reliability that people expect every day.

Where it fits

Before learning load balancing, you should understand basic cloud servers and networking. After this, you can learn about advanced traffic management, auto-scaling, and fault tolerance to build highly reliable cloud systems.

Mental Model

Core Idea

Load balancing is like a traffic cop that directs requests evenly to servers so none get stuck in a jam.

Think of it like...

Imagine a busy restaurant with many waiters. The host seats new guests evenly among waiters so no one waiter is overwhelmed and all guests get good service quickly.

┌───────────────┐
│   Clients     │
└──────┬────────┘
       │ Requests
       ▼
┌───────────────┐
│ Load Balancer │
└──────┬────────┘
       │ Distributes
       ▼
┌──────┴───────┐  ┌──────┴───────┐  ┌──────┴───────┐
│   Server 1   │  │   Server 2   │  │   Server 3   │
└──────────────┘  └──────────────┘  └──────────────┘

Build-Up - 7 Steps

1

FoundationWhat is Load Balancing?

Concept: Load balancing means sharing work across multiple servers to avoid overload.

When many people visit a website, their requests go to servers. If one server gets too many requests, it slows down or crashes. Load balancing sends requests to different servers to keep things fast and stable.

Result

Requests are spread out, so no single server is overwhelmed.

Understanding load balancing starts with seeing how spreading work prevents slowdowns and crashes.

2

FoundationTypes of Load Balancers

3

IntermediateHow Load Balancers Distribute Traffic

4

IntermediateHealth Probes Keep Services Reliable

5

IntermediateLoad Balancing and Scalability

6

AdvancedSession Persistence and Sticky Sessions

7

ExpertLoad Balancer Failover and High Availability

Under the Hood

Load balancers receive incoming network requests and use configured rules to select a backend server. They track server health via probes and maintain connection states if needed. In Azure, load balancers operate at different layers: Azure Load Balancer works at the transport layer (TCP/UDP), while Application Gateway works at the application layer (HTTP/HTTPS), allowing more advanced routing.

Why designed this way?

Load balancing was designed to solve the problem of uneven workload distribution and single points of failure. Early systems had simple round-robin methods, but as apps grew complex, features like health probes and session persistence were added. Azure's layered approach allows flexibility for different app types and scales.

┌───────────────┐
│ Client Request│
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Load Balancer │
├───────────────┤
│ Health Probes │
│ Distribution  │
│ Rules         │
└──────┬────────┘
       │
       ▼
┌──────┴───────┐  ┌──────┴───────┐  ┌──────┴───────┐
│ Server 1     │  │ Server 2     │  │ Server 3     │
│ (Healthy)    │  │ (Healthy)    │  │ (Unhealthy)  │
└──────────────┘  └──────────────┘  └──────────────┘
       ▲               ▲               ▲
       └───────────────┴───────────────┘
               Health Probe Checks

Myth Busters - 4 Common Misconceptions

Quick: Do load balancers always send traffic equally to all servers, no matter what? Commit to yes or no.

Common Belief:Load balancers just split traffic evenly among all servers all the time.

Tap to reveal reality

Quick: Do you think load balancers can fix slow servers by themselves? Commit to yes or no.

Common Belief:Load balancers can speed up slow servers by balancing load.

Tap to reveal reality

Quick: Do you think session persistence is always on by default? Commit to yes or no.

Common Belief:Load balancers always send users to the same server to keep sessions consistent.

Tap to reveal reality

Quick: Do you think load balancers themselves never fail? Commit to yes or no.

Common Belief:Load balancers are always reliable and cannot fail.

Tap to reveal reality

Expert Zone

1

Some load balancing algorithms can cause uneven load if server capacities differ; weighting servers helps balance this.

2

Application-layer load balancers can inspect and route traffic based on content, enabling advanced scenarios like A/B testing.

3

Health probes must be carefully designed to avoid false positives or negatives that can cause traffic to be sent to unhealthy servers.

When NOT to use

Load balancing is not suitable for stateful applications that cannot share session data without additional mechanisms. In such cases, consider sticky sessions or distributed caches. Also, for very low traffic or single-server apps, load balancing adds unnecessary complexity.

Production Patterns

In production, Azure load balancers are combined with auto-scaling groups to add or remove servers automatically. Application Gateway is used for web apps needing SSL termination and URL-based routing. Multi-region load balancing distributes traffic globally for disaster recovery and latency optimization.

Connections

Auto-scaling

Load balancing works together with auto-scaling to handle changing traffic by adding or removing servers.

Understanding load balancing helps grasp how cloud systems grow and shrink smoothly to meet demand.

Fault Tolerance

Load balancing contributes to fault tolerance by routing traffic away from failed servers.

Knowing load balancing clarifies how systems stay available despite hardware or software failures.

Traffic Management in Road Systems

Both manage flow to avoid congestion and ensure smooth movement.

Seeing load balancing like road traffic control reveals universal principles of distributing work to prevent jams.

Common Pitfalls

#1Sending traffic to unhealthy servers causes errors.

Wrong approach:Configure load balancer without health probes or disable them. Example: Azure Load Balancer with no health probe setup.

Correct approach:Configure health probes to regularly check server health. Example: Azure Load Balancer with TCP health probe on port 80.

Root cause:Not understanding that load balancers rely on health checks to avoid broken servers.

#2Not configuring session persistence breaks user sessions.

Wrong approach:Use load balancer without sticky sessions for apps needing session data. Example: Azure Application Gateway without cookie-based affinity.

Correct approach:Enable session persistence (cookie-based affinity) for stateful apps. Example: Azure Application Gateway with cookie affinity enabled.

Root cause:Assuming load balancers always keep users on the same server by default.

#3Using simple round-robin when servers have different capacities causes overload.

Wrong approach:Configure load balancer with round-robin only, ignoring server power differences.

Correct approach:Use weighted load balancing to assign more traffic to stronger servers.

Root cause:Not considering server capacity differences in traffic distribution.

Key Takeaways

Load balancing spreads user requests across multiple servers to keep apps fast and reliable.

It uses rules and health checks to send traffic only to healthy servers, avoiding downtime.

Session persistence is important for apps that need users to stay connected to the same server.

Load balancers themselves must be highly available to prevent becoming a single point of failure.

In cloud systems like Azure, load balancing works closely with scaling and fault tolerance to build resilient applications.