HLDsystem_design~25 mins

Design a rate limiter in HLD - System Design Exercise

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Design: Rate Limiter

Design focuses on the rate limiting component integrated with an API gateway or service. Authentication and API business logic are out of scope.

Functional Requirements

FR1: Limit the number of requests a user can make in a given time window

FR2: Support different rate limits for different users or API keys

FR3: Provide real-time feedback when limits are exceeded

FR4: Ensure minimal latency impact on request processing

FR5: Allow configuration of limits per API endpoint

FR6: Support distributed deployment for scalability

Non-Functional Requirements

NFR1: Handle up to 100,000 requests per second

NFR2: Latency for rate limit check should be under 10ms (p99)

NFR3: Availability target of 99.9% uptime

NFR4: Rate limits must be enforced accurately across multiple servers

NFR5: System should be resilient to clock skew between servers

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

❓ Question 6

Key Components

API Gateway or Proxy to intercept requests

In-memory cache or datastore for counters (e.g., Redis)

Distributed synchronization mechanism

Configuration service for rate limit rules

Monitoring and alerting system

Design Patterns

Fixed Window Counter

Sliding Window Log

Sliding Window Counter

Token Bucket

Leaky Bucket

Distributed Cache with Atomic Operations

Reference Architecture

Client
  |
  v
API Gateway / Proxy
  |
  v
Rate Limiter Service <--> Configuration Store
  |
  v
Distributed Cache (Redis Cluster)
  |
  v
Backend Services

Monitoring & Alerting System (observes Rate Limiter metrics)

Components

API Gateway / Proxy

Nginx, Envoy, or custom proxy

Intercept incoming requests and forward them to rate limiter before backend

Rate Limiter Service

Stateless microservice in Go/Java/Python

Check and enforce rate limits using counters in distributed cache

Distributed Cache

Redis Cluster with atomic INCR and EXPIRE commands

Store counters for requests per user/key with TTL for time windows

Configuration Store

Relational DB or NoSQL (PostgreSQL, DynamoDB)

Store rate limit rules per user, API key, or endpoint

Monitoring & Alerting

Prometheus + Grafana or Datadog

Track rate limiter performance, errors, and usage patterns

Request Flow

1. Client sends request to API Gateway

2. API Gateway forwards request to Rate Limiter Service

3. Rate Limiter Service fetches applicable rate limit rules from Configuration Store or cache

4. Rate Limiter Service increments request counter in Distributed Cache atomically

5. If counter exceeds limit, Rate Limiter Service rejects request with 429 Too Many Requests

6. If under limit, Rate Limiter Service allows request to proceed to backend

7. Rate Limiter Service returns response to API Gateway

8. API Gateway forwards response to client

9. Monitoring system collects metrics from Rate Limiter Service

Database Schema

Entities: - User (user_id, user_type, etc.) - APIKey (key_id, user_id, permissions) - RateLimitRule (rule_id, target_type [user/api_key/endpoint], target_id, limit_count, window_seconds) Relationships: - User 1:N APIKey - RateLimitRule applies to User or APIKey or Endpoint Counters stored in Redis as keys: "rate_limit:{target_id}:{window_start_timestamp}" with integer count and TTL equal to window_seconds

Scaling Discussion

Bottlenecks

Distributed cache becoming a single point of failure or bottleneck under high QPS

Synchronization issues causing inaccurate counters due to race conditions

Latency increase due to network calls to cache for every request

Configuration store latency or stale rules causing incorrect enforcement

Handling sudden traffic spikes causing burst limit breaches

Solutions

Use Redis Cluster with sharding and replication for high availability and throughput

Use atomic increment operations and Lua scripts in Redis to ensure consistency

Implement local caching of rate limit rules to reduce configuration store calls

Use approximate algorithms (e.g., sliding window counters) to reduce strict locking

Implement burst handling with token bucket allowing short bursts without penalty

Deploy rate limiter service close to API gateway to reduce network latency

Interview Tips

Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing architecture and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.

Clarify types of rate limiting and use cases

Explain choice of data store and atomic operations for counters

Discuss trade-offs between accuracy and performance

Highlight distributed consistency challenges and solutions

Mention monitoring importance for operational health

Show awareness of scaling bottlenecks and mitigation strategies