HLDsystem_design~7 mins

Rate limiting algorithms (token bucket, leaky bucket) in HLD - System Design Guide

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Problem Statement

When a service receives too many requests in a short time, it can become overwhelmed, leading to slow responses or crashes. Without controlling the request rate, users or clients can unintentionally or maliciously flood the system, causing denial of service or degraded experience for others.

Solution

Rate limiting algorithms control how many requests a client can make in a given time. The token bucket algorithm allows bursts by accumulating tokens that represent permission to proceed, while the leaky bucket algorithm processes requests at a steady rate, smoothing out bursts. Both prevent overload by rejecting or delaying excess requests.

Architecture

Client

→Rate Limiter

↓

Token Bucket

This diagram shows a client sending requests to a rate limiter that uses either a token bucket or leaky bucket algorithm to control request flow before forwarding allowed requests to the service.

Trade-offs

✓ Pros

→

Token bucket allows short bursts of traffic, improving user experience during sudden spikes.

→

Leaky bucket enforces a smooth, constant request rate, preventing sudden overloads.

→

Both algorithms protect backend services from being overwhelmed by excessive requests.

✗ Cons

→

Token bucket complexity increases with distributed systems needing synchronized token counts.

→

Leaky bucket can introduce latency by queuing requests to maintain steady output.

→

Implementing accurate rate limiting requires careful handling of time and concurrency.

Use token bucket when burst tolerance is needed and occasional spikes are acceptable. Use leaky bucket when a steady, predictable request rate is critical. Apply rate limiting when request rates exceed hundreds or thousands per second to protect system stability.

Avoid complex rate limiting algorithms for very low traffic systems under 100 requests per second where overhead outweighs benefits. Do not use leaky bucket if burst traffic is common and must be served quickly.

Real World Examples

Amazon

Amazon uses token bucket rate limiting in AWS API Gateway to allow clients to burst requests while protecting backend APIs from overload.

Twitter

Twitter applies leaky bucket algorithms to smooth out tweet posting rates, preventing sudden spikes from overwhelming their servers.

Google

Google Cloud services implement token bucket rate limiting to balance user request bursts with backend capacity.

Alternatives

Fixed Window Counter

Counts requests in fixed time windows without smoothing bursts, leading to potential spikes at window boundaries.

Use when: Use when simplicity is more important than smooth rate control and traffic is relatively uniform.

Sliding Window Log

Tracks timestamps of each request for precise rate limiting but requires more memory and processing.

Use when: Choose when accurate per-request rate limiting is needed and system resources allow.

Summary

Rate limiting algorithms prevent system overload by controlling request rates.

Token bucket allows bursts by accumulating tokens, while leaky bucket smooths traffic at a fixed rate.

Choosing the right algorithm depends on whether burst tolerance or steady output is more important.