HLDsystem_design~25 mins

Rate limiting algorithms (token bucket, leaky bucket) in HLD - System Design Exercise

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Design: Rate Limiting System

Design the core rate limiting algorithms and their integration in a distributed API gateway or service. Out of scope: detailed user authentication, billing, or analytics.

Functional Requirements

FR1: Limit the number of requests a user or client can make in a given time window

FR2: Support burst traffic without dropping all requests immediately

FR3: Prevent system overload by smoothing request rates

FR4: Provide configurable limits per user or API key

FR5: Handle millions of requests per second across distributed servers

FR6: Ensure fairness and prevent abuse

Non-Functional Requirements

NFR1: Latency impact on request processing must be minimal (p99 < 10ms)

NFR2: System must be highly available (99.9% uptime)

NFR3: Rate limits must be consistent across distributed instances

NFR4: Support horizontal scaling to handle increasing traffic

NFR5: Memory usage per user/client must be efficient

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

Key Components

API Gateway or Load Balancer

Rate Limiter module implementing token bucket or leaky bucket

Distributed cache or in-memory store (e.g., Redis) for counters

Configuration service for rate limit policies

Monitoring and alerting system

Design Patterns

Token Bucket algorithm for burst handling

Leaky Bucket algorithm for smoothing traffic

Distributed counters with atomic increments

Sliding window rate limiting

Client-side vs server-side rate limiting

Reference Architecture

Client --> API Gateway --> Rate Limiter --> Backend Service
                    |              |
                    |              v
                    |          Distributed Cache
                    v
               Configuration Service

Components

API Gateway

Nginx or Envoy proxy

Entry point for client requests, routes traffic, and enforces rate limits

Rate Limiter

Custom module or middleware

Implements token bucket or leaky bucket algorithm to allow or reject requests

Distributed Cache

Redis with atomic commands

Stores counters and tokens per user/client for distributed consistency

Configuration Service

Central config store (e.g., Consul, etcd)

Manages rate limit policies and parameters

Request Flow

1. Client sends request to API Gateway

2. API Gateway forwards request to Rate Limiter module

3. Rate Limiter fetches current token count or leak state from Distributed Cache

4. Rate Limiter applies token bucket or leaky bucket logic:

5. - Token Bucket: refill tokens at fixed rate, consume token per request

6. - Leaky Bucket: requests enter queue, leak out at fixed rate

7. If tokens available or bucket not full, allow request and update cache

8. If limit exceeded, reject request with HTTP 429 Too Many Requests

9. Allowed requests proceed to Backend Service

10. Configuration Service updates rate limit parameters dynamically

Database Schema

Entities: - User or Client: id, rate_limit_policy_id - RateLimitPolicy: id, max_tokens, refill_rate, bucket_capacity, algorithm_type - TokenBucketState: user_id, tokens, last_refill_timestamp - LeakyBucketState: user_id, queue_size, last_leak_timestamp Relationships: - User references RateLimitPolicy - TokenBucketState and LeakyBucketState track per-user algorithm state

Scaling Discussion

Bottlenecks

Distributed cache becoming a single point of failure or bottleneck

High latency due to frequent cache reads/writes for counters

Inconsistent rate limiting state across distributed nodes

Memory overhead for storing state of millions of users

Handling sudden traffic spikes causing bursts beyond capacity

Solutions

Use Redis clusters with sharding and replication for high availability

Batch or approximate counter updates to reduce cache operations

Use consistent hashing to route requests of same user to same node

Evict or expire inactive user states to save memory

Implement hierarchical rate limiting with global and local limits

Interview Tips

Time: Spend 10 minutes understanding requirements and clarifying assumptions, 20 minutes designing the architecture and algorithms, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.

Explain difference between token bucket and leaky bucket algorithms

Discuss how burst traffic is handled gracefully

Describe how distributed state is managed for consistency

Highlight latency and availability considerations

Mention trade-offs between accuracy and performance