Microservicessystem_design~10 mins

Rate limiting in Microservices - Scalability & System Analysis

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Scalability Analysis - Rate limiting

Growth Table: Rate Limiting at Different Scales

Users	Requests per Second (RPS)	Rate Limiter Type	Infrastructure Changes	Challenges
100 users	~500 RPS	In-process (local) rate limiting	Single microservice instance	Simple counters, low overhead
10,000 users	~50,000 RPS	Centralized rate limiter (Redis or API Gateway)	Multiple microservice instances, shared cache	Consistency, latency in shared store
1,000,000 users	~5,000,000 RPS	Distributed rate limiting with sharded stores	Multiple rate limiter clusters, load balancers	Data partitioning, synchronization, failover
100,000,000 users	~500,000,000 RPS	Hierarchical rate limiting with edge/CDN enforcement	Global distributed caches, edge nodes, multi-region	Network bandwidth, global consistency, cost

First Bottleneck

At small scale, the first bottleneck is the in-process memory for counters in each microservice instance. As traffic grows, the bottleneck shifts to the centralized data store (like Redis) used for shared counters, which can become overwhelmed by high request rates and cause latency.

Scaling Solutions

Local Rate Limiting: Use in-memory counters for low traffic to avoid network calls.
Centralized Store: Use Redis or Memcached with connection pooling for moderate scale.
Sharding: Partition keys by user or API key to distribute load across multiple Redis instances.
Hierarchical Rate Limiting: Combine edge (CDN or API Gateway) and backend limits to reduce backend load.
Token Bucket or Leaky Bucket Algorithms: Efficient algorithms to smooth bursts and reduce storage overhead.
Asynchronous Updates: Use approximate counters or probabilistic data structures to reduce write load.
Load Balancing: Distribute requests evenly to rate limiter clusters to avoid hotspots.

Back-of-Envelope Cost Analysis

At 10,000 users with 50,000 RPS, Redis needs to handle ~50,000 ops/sec, which is near a single Redis instance limit; requires sharding or clustering.
Each request counter uses a few bytes; for 1M users, storage for counters can reach several GBs in Redis.
Network bandwidth for rate limiter calls grows with RPS; at 5M RPS, requires multiple high-throughput network links.
CPU usage on microservices increases with local rate limiting logic; offloading to dedicated rate limiter services can reduce this.

Interview Tip

Start by clarifying the scale and traffic patterns. Discuss simple local rate limiting first, then explain how centralized stores become bottlenecks. Describe sharding and hierarchical approaches. Emphasize trade-offs between accuracy, latency, and cost. Use real numbers to show understanding of limits and solutions.

Self Check Question

Your database handles 1000 QPS for rate limiting counters. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Introduce caching or sharding to distribute load. For example, add Redis read replicas or partition counters by user ID to multiple Redis instances to avoid overloading a single database.

Key Result

Rate limiting scales from simple in-memory counters at low traffic to distributed, sharded, and hierarchical systems at high traffic. The first bottleneck is usually the centralized data store for counters, which requires sharding and caching to scale efficiently.

Practice

(1/5)

1. What is the main purpose of rate limiting in microservices?

easy

A. To control how many requests a user can make in a given time

B. To increase the speed of the service

C. To store user data securely

D. To balance the load between servers

Rate limiting in Microservices - Scalability & System Analysis

Start learning this pattern below

Practice

Solution

Step 1: Understand the concept of rate limiting

Step 2: Identify the main goal of rate limiting

Final Answer:

Quick Check:

Solution

Step 1: Understand fixed window rate limiting logic

Step 2: Match the correct condition for allowing or blocking

Final Answer:

Quick Check:

Solution

Step 1: Check current tokens against requested tokens

Step 2: Determine if request is allowed or blocked

Final Answer:

Quick Check:

Solution

Step 1: Understand sliding window rate limiter behavior

Step 2: Identify issue with multiple servers and no shared state

Final Answer:

Quick Check:

Solution

Step 1: Analyze scalability needs for 10 million users

Step 2: Evaluate distributed token bucket with local caches

Step 3: Consider client-side rate limiting

Final Answer:

Quick Check: