How to Design a Rate Limiter: Simple and Scalable Approach
To design a
rate limiter, choose an algorithm like fixed window, sliding window, or token bucket to control how many requests a user can make in a time frame. Implement it using counters or tokens stored in memory or a fast database, and ensure it scales by distributing limits across servers or using centralized stores.Syntax
A rate limiter typically involves these parts:
- Key: Identifier for the user or client (e.g., user ID or IP address).
- Limit: Maximum allowed requests in a time window.
- Window: Time duration for the limit (e.g., 1 minute).
- Counter or Tokens: Tracks requests made or tokens available.
- Algorithm: Logic to allow or reject requests based on counters or tokens.
python
class RateLimiter: def __init__(self, limit: int, window_seconds: int): self.limit = limit self.window = window_seconds self.requests = {} # key: [timestamps] def allow_request(self, key: str, timestamp: int) -> bool: if key not in self.requests: self.requests[key] = [] # Remove timestamps outside the window self.requests[key] = [t for t in self.requests[key] if t > timestamp - self.window] if len(self.requests[key]) < self.limit: self.requests[key].append(timestamp) return True return False
Example
This example shows a simple fixed window rate limiter that allows 3 requests per 10 seconds per user.
python
import time class FixedWindowRateLimiter: def __init__(self, limit, window_seconds): self.limit = limit self.window = window_seconds self.counters = {} # key: (window_start, count) def allow_request(self, key): current_time = int(time.time()) window_start = current_time - (current_time % self.window) if key not in self.counters or self.counters[key][0] != window_start: self.counters[key] = (window_start, 0) window, count = self.counters[key] if count < self.limit: self.counters[key] = (window, count + 1) return True return False limiter = FixedWindowRateLimiter(3, 10) user = "user1" for i in range(5): allowed = limiter.allow_request(user) print(f"Request {i+1} allowed: {allowed}")
Output
Request 1 allowed: True
Request 2 allowed: True
Request 3 allowed: True
Request 4 allowed: False
Request 5 allowed: False
Common Pitfalls
- Using fixed window without sliding: Can cause bursts at window edges.
- Storing too much data: Tracking every request timestamp can consume memory.
- Not handling distributed systems: Local counters fail when requests hit multiple servers.
- Ignoring clock skew: Different servers may have different times causing inconsistent limits.
Use sliding window or token bucket algorithms and centralized stores like Redis for better accuracy and scalability.
python
class WrongRateLimiter: def __init__(self, limit): self.limit = limit self.count = 0 def allow_request(self): # No time window, just counts requests forever if self.count < self.limit: self.count += 1 return True return False class RightRateLimiter: def __init__(self, limit, window_seconds): self.limit = limit self.window = window_seconds self.timestamps = [] def allow_request(self, timestamp): self.timestamps = [t for t in self.timestamps if t > timestamp - self.window] if len(self.timestamps) < self.limit: self.timestamps.append(timestamp) return True return False
Quick Reference
- Fixed Window: Simple, counts requests in fixed time slots, can cause bursts.
- Sliding Window: More accurate, tracks requests in a moving time frame.
- Token Bucket: Allows bursts by accumulating tokens, refills tokens over time.
- Leaky Bucket: Smooths out bursts by processing requests at a fixed rate.
- Storage: Use in-memory cache for single server, Redis or distributed cache for multiple servers.
Key Takeaways
Choose the right algorithm (fixed window, sliding window, token bucket) based on accuracy and burst tolerance needs.
Use centralized storage like Redis for distributed systems to keep counters consistent.
Avoid storing every request timestamp to save memory; use counters or token counts instead.
Consider clock synchronization and network delays in distributed rate limiting.
Test your rate limiter under load to ensure it scales and behaves as expected.