Bird
Raised Fist0
HLDsystem_design~25 mins

Design a rate limiter in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Rate Limiter
Design focuses on the rate limiting component integrated with an API gateway or service. Authentication and API business logic are out of scope.
Functional Requirements
FR1: Limit the number of requests a user can make in a given time window
FR2: Support different rate limits for different users or API keys
FR3: Provide real-time feedback when limits are exceeded
FR4: Ensure minimal latency impact on request processing
FR5: Allow configuration of limits per API endpoint
FR6: Support distributed deployment for scalability
Non-Functional Requirements
NFR1: Handle up to 100,000 requests per second
NFR2: Latency for rate limit check should be under 10ms (p99)
NFR3: Availability target of 99.9% uptime
NFR4: Rate limits must be enforced accurately across multiple servers
NFR5: System should be resilient to clock skew between servers
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
API Gateway or Proxy to intercept requests
In-memory cache or datastore for counters (e.g., Redis)
Distributed synchronization mechanism
Configuration service for rate limit rules
Monitoring and alerting system
Design Patterns
Fixed Window Counter
Sliding Window Log
Sliding Window Counter
Token Bucket
Leaky Bucket
Distributed Cache with Atomic Operations
Reference Architecture
Client
  |
  v
API Gateway / Proxy
  |
  v
Rate Limiter Service <--> Configuration Store
  |
  v
Distributed Cache (Redis Cluster)
  |
  v
Backend Services

Monitoring & Alerting System (observes Rate Limiter metrics)
Components
API Gateway / Proxy
Nginx, Envoy, or custom proxy
Intercept incoming requests and forward them to rate limiter before backend
Rate Limiter Service
Stateless microservice in Go/Java/Python
Check and enforce rate limits using counters in distributed cache
Distributed Cache
Redis Cluster with atomic INCR and EXPIRE commands
Store counters for requests per user/key with TTL for time windows
Configuration Store
Relational DB or NoSQL (PostgreSQL, DynamoDB)
Store rate limit rules per user, API key, or endpoint
Monitoring & Alerting
Prometheus + Grafana or Datadog
Track rate limiter performance, errors, and usage patterns
Request Flow
1. Client sends request to API Gateway
2. API Gateway forwards request to Rate Limiter Service
3. Rate Limiter Service fetches applicable rate limit rules from Configuration Store or cache
4. Rate Limiter Service increments request counter in Distributed Cache atomically
5. If counter exceeds limit, Rate Limiter Service rejects request with 429 Too Many Requests
6. If under limit, Rate Limiter Service allows request to proceed to backend
7. Rate Limiter Service returns response to API Gateway
8. API Gateway forwards response to client
9. Monitoring system collects metrics from Rate Limiter Service
Database Schema
Entities: - User (user_id, user_type, etc.) - APIKey (key_id, user_id, permissions) - RateLimitRule (rule_id, target_type [user/api_key/endpoint], target_id, limit_count, window_seconds) Relationships: - User 1:N APIKey - RateLimitRule applies to User or APIKey or Endpoint Counters stored in Redis as keys: "rate_limit:{target_id}:{window_start_timestamp}" with integer count and TTL equal to window_seconds
Scaling Discussion
Bottlenecks
Distributed cache becoming a single point of failure or bottleneck under high QPS
Synchronization issues causing inaccurate counters due to race conditions
Latency increase due to network calls to cache for every request
Configuration store latency or stale rules causing incorrect enforcement
Handling sudden traffic spikes causing burst limit breaches
Solutions
Use Redis Cluster with sharding and replication for high availability and throughput
Use atomic increment operations and Lua scripts in Redis to ensure consistency
Implement local caching of rate limit rules to reduce configuration store calls
Use approximate algorithms (e.g., sliding window counters) to reduce strict locking
Implement burst handling with token bucket allowing short bursts without penalty
Deploy rate limiter service close to API gateway to reduce network latency
Interview Tips
Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing architecture and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Clarify types of rate limiting and use cases
Explain choice of data store and atomic operations for counters
Discuss trade-offs between accuracy and performance
Highlight distributed consistency challenges and solutions
Mention monitoring importance for operational health
Show awareness of scaling bottlenecks and mitigation strategies