0
0
Microservicessystem_design~10 mins

Rate limiting in Microservices - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Rate limiting
Growth Table: Rate Limiting at Different Scales
UsersRequests per Second (RPS)Rate Limiter TypeInfrastructure ChangesChallenges
100 users~500 RPSIn-process (local) rate limitingSingle microservice instanceSimple counters, low overhead
10,000 users~50,000 RPSCentralized rate limiter (Redis or API Gateway)Multiple microservice instances, shared cacheConsistency, latency in shared store
1,000,000 users~5,000,000 RPSDistributed rate limiting with sharded storesMultiple rate limiter clusters, load balancersData partitioning, synchronization, failover
100,000,000 users~500,000,000 RPSHierarchical rate limiting with edge/CDN enforcementGlobal distributed caches, edge nodes, multi-regionNetwork bandwidth, global consistency, cost
First Bottleneck

At small scale, the first bottleneck is the in-process memory for counters in each microservice instance. As traffic grows, the bottleneck shifts to the centralized data store (like Redis) used for shared counters, which can become overwhelmed by high request rates and cause latency.

Scaling Solutions
  • Local Rate Limiting: Use in-memory counters for low traffic to avoid network calls.
  • Centralized Store: Use Redis or Memcached with connection pooling for moderate scale.
  • Sharding: Partition keys by user or API key to distribute load across multiple Redis instances.
  • Hierarchical Rate Limiting: Combine edge (CDN or API Gateway) and backend limits to reduce backend load.
  • Token Bucket or Leaky Bucket Algorithms: Efficient algorithms to smooth bursts and reduce storage overhead.
  • Asynchronous Updates: Use approximate counters or probabilistic data structures to reduce write load.
  • Load Balancing: Distribute requests evenly to rate limiter clusters to avoid hotspots.
Back-of-Envelope Cost Analysis
  • At 10,000 users with 50,000 RPS, Redis needs to handle ~50,000 ops/sec, which is near a single Redis instance limit; requires sharding or clustering.
  • Each request counter uses a few bytes; for 1M users, storage for counters can reach several GBs in Redis.
  • Network bandwidth for rate limiter calls grows with RPS; at 5M RPS, requires multiple high-throughput network links.
  • CPU usage on microservices increases with local rate limiting logic; offloading to dedicated rate limiter services can reduce this.
Interview Tip

Start by clarifying the scale and traffic patterns. Discuss simple local rate limiting first, then explain how centralized stores become bottlenecks. Describe sharding and hierarchical approaches. Emphasize trade-offs between accuracy, latency, and cost. Use real numbers to show understanding of limits and solutions.

Self Check Question

Your database handles 1000 QPS for rate limiting counters. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Introduce caching or sharding to distribute load. For example, add Redis read replicas or partition counters by user ID to multiple Redis instances to avoid overloading a single database.

Key Result
Rate limiting scales from simple in-memory counters at low traffic to distributed, sharded, and hierarchical systems at high traffic. The first bottleneck is usually the centralized data store for counters, which requires sharding and caching to scale efficiently.