0
0
HLDsystem_design~25 mins

Rate limiting algorithms (token bucket, leaky bucket) in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Rate Limiting System
Design the core rate limiting algorithms and their integration in a distributed API gateway or service. Out of scope: detailed user authentication, billing, or analytics.
Functional Requirements
FR1: Limit the number of requests a user or client can make in a given time window
FR2: Support burst traffic without dropping all requests immediately
FR3: Prevent system overload by smoothing request rates
FR4: Provide configurable limits per user or API key
FR5: Handle millions of requests per second across distributed servers
FR6: Ensure fairness and prevent abuse
Non-Functional Requirements
NFR1: Latency impact on request processing must be minimal (p99 < 10ms)
NFR2: System must be highly available (99.9% uptime)
NFR3: Rate limits must be consistent across distributed instances
NFR4: Support horizontal scaling to handle increasing traffic
NFR5: Memory usage per user/client must be efficient
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
Key Components
API Gateway or Load Balancer
Rate Limiter module implementing token bucket or leaky bucket
Distributed cache or in-memory store (e.g., Redis) for counters
Configuration service for rate limit policies
Monitoring and alerting system
Design Patterns
Token Bucket algorithm for burst handling
Leaky Bucket algorithm for smoothing traffic
Distributed counters with atomic increments
Sliding window rate limiting
Client-side vs server-side rate limiting
Reference Architecture
Client --> API Gateway --> Rate Limiter --> Backend Service
                    |              |
                    |              v
                    |          Distributed Cache
                    v
               Configuration Service
Components
API Gateway
Nginx or Envoy proxy
Entry point for client requests, routes traffic, and enforces rate limits
Rate Limiter
Custom module or middleware
Implements token bucket or leaky bucket algorithm to allow or reject requests
Distributed Cache
Redis with atomic commands
Stores counters and tokens per user/client for distributed consistency
Configuration Service
Central config store (e.g., Consul, etcd)
Manages rate limit policies and parameters
Request Flow
1. Client sends request to API Gateway
2. API Gateway forwards request to Rate Limiter module
3. Rate Limiter fetches current token count or leak state from Distributed Cache
4. Rate Limiter applies token bucket or leaky bucket logic:
5. - Token Bucket: refill tokens at fixed rate, consume token per request
6. - Leaky Bucket: requests enter queue, leak out at fixed rate
7. If tokens available or bucket not full, allow request and update cache
8. If limit exceeded, reject request with HTTP 429 Too Many Requests
9. Allowed requests proceed to Backend Service
10. Configuration Service updates rate limit parameters dynamically
Database Schema
Entities: - User or Client: id, rate_limit_policy_id - RateLimitPolicy: id, max_tokens, refill_rate, bucket_capacity, algorithm_type - TokenBucketState: user_id, tokens, last_refill_timestamp - LeakyBucketState: user_id, queue_size, last_leak_timestamp Relationships: - User references RateLimitPolicy - TokenBucketState and LeakyBucketState track per-user algorithm state
Scaling Discussion
Bottlenecks
Distributed cache becoming a single point of failure or bottleneck
High latency due to frequent cache reads/writes for counters
Inconsistent rate limiting state across distributed nodes
Memory overhead for storing state of millions of users
Handling sudden traffic spikes causing bursts beyond capacity
Solutions
Use Redis clusters with sharding and replication for high availability
Batch or approximate counter updates to reduce cache operations
Use consistent hashing to route requests of same user to same node
Evict or expire inactive user states to save memory
Implement hierarchical rate limiting with global and local limits
Interview Tips
Time: Spend 10 minutes understanding requirements and clarifying assumptions, 20 minutes designing the architecture and algorithms, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Explain difference between token bucket and leaky bucket algorithms
Discuss how burst traffic is handled gracefully
Describe how distributed state is managed for consistency
Highlight latency and availability considerations
Mention trade-offs between accuracy and performance