How to design rate limiter

HldHow-ToIntermediate · 4 min read

How to Design a Rate Limiter: Simple and Scalable Approach

To design a rate limiter, choose an algorithm like fixed window, sliding window, or token bucket to control how many requests a user can make in a time frame. Implement it using counters or tokens stored in memory or a fast database, and ensure it scales by distributing limits across servers or using centralized stores.

📐

Syntax

A rate limiter typically involves these parts:

Key: Identifier for the user or client (e.g., user ID or IP address).
Limit: Maximum allowed requests in a time window.
Window: Time duration for the limit (e.g., 1 minute).
Counter or Tokens: Tracks requests made or tokens available.
Algorithm: Logic to allow or reject requests based on counters or tokens.

python

class RateLimiter:
    def __init__(self, limit: int, window_seconds: int):
        self.limit = limit
        self.window = window_seconds
        self.requests = {}  # key: [timestamps]

    def allow_request(self, key: str, timestamp: int) -> bool:
        if key not in self.requests:
            self.requests[key] = []
        # Remove timestamps outside the window
        self.requests[key] = [t for t in self.requests[key] if t > timestamp - self.window]
        if len(self.requests[key]) < self.limit:
            self.requests[key].append(timestamp)
            return True
        return False

💻

Example

This example shows a simple fixed window rate limiter that allows 3 requests per 10 seconds per user.

python

import time

class FixedWindowRateLimiter:
    def __init__(self, limit, window_seconds):
        self.limit = limit
        self.window = window_seconds
        self.counters = {}  # key: (window_start, count)

    def allow_request(self, key):
        current_time = int(time.time())
        window_start = current_time - (current_time % self.window)
        if key not in self.counters or self.counters[key][0] != window_start:
            self.counters[key] = (window_start, 0)
        window, count = self.counters[key]
        if count < self.limit:
            self.counters[key] = (window, count + 1)
            return True
        return False

limiter = FixedWindowRateLimiter(3, 10)
user = "user1"
for i in range(5):
    allowed = limiter.allow_request(user)
    print(f"Request {i+1} allowed: {allowed}")

Output

Request 1 allowed: True Request 2 allowed: True Request 3 allowed: True Request 4 allowed: False Request 5 allowed: False

⚠️

Common Pitfalls

Using fixed window without sliding: Can cause bursts at window edges.
Storing too much data: Tracking every request timestamp can consume memory.
Not handling distributed systems: Local counters fail when requests hit multiple servers.
Ignoring clock skew: Different servers may have different times causing inconsistent limits.

Use sliding window or token bucket algorithms and centralized stores like Redis for better accuracy and scalability.

python

class WrongRateLimiter:
    def __init__(self, limit):
        self.limit = limit
        self.count = 0

    def allow_request(self):
        # No time window, just counts requests forever
        if self.count < self.limit:
            self.count += 1
            return True
        return False

class RightRateLimiter:
    def __init__(self, limit, window_seconds):
        self.limit = limit
        self.window = window_seconds
        self.timestamps = []

    def allow_request(self, timestamp):
        self.timestamps = [t for t in self.timestamps if t > timestamp - self.window]
        if len(self.timestamps) < self.limit:
            self.timestamps.append(timestamp)
            return True
        return False

📊

Quick Reference

Fixed Window: Simple, counts requests in fixed time slots, can cause bursts.
Sliding Window: More accurate, tracks requests in a moving time frame.
Token Bucket: Allows bursts by accumulating tokens, refills tokens over time.
Leaky Bucket: Smooths out bursts by processing requests at a fixed rate.
Storage: Use in-memory cache for single server, Redis or distributed cache for multiple servers.

✅

Key Takeaways

Choose the right algorithm (fixed window, sliding window, token bucket) based on accuracy and burst tolerance needs.

Use centralized storage like Redis for distributed systems to keep counters consistent.

Avoid storing every request timestamp to save memory; use counters or token counts instead.

Consider clock synchronization and network delays in distributed rate limiting.

Test your rate limiter under load to ensure it scales and behaves as expected.