Bird
Raised Fist0
Microservicessystem_design~25 mins

Rate limiting in Microservices - System Design Exercise

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Design: Rate Limiting System for Microservices
Design the rate limiting mechanism and its integration with microservices APIs. Out of scope: detailed API business logic, user authentication mechanisms.
Functional Requirements
FR1: Limit the number of requests a user or client can make to an API within a given time window
FR2: Support different rate limits for different users or API keys
FR3: Provide real-time feedback when limits are exceeded
FR4: Ensure rate limiting works correctly in a distributed microservices environment
FR5: Allow configuration changes without downtime
Non-Functional Requirements
NFR1: Handle up to 100,000 requests per second across all services
NFR2: Enforce limits with p99 latency under 50ms
NFR3: Achieve 99.9% availability
NFR4: Support horizontal scaling of microservices
NFR5: Avoid single points of failure
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
API Gateway or Edge Proxy
Distributed Cache or In-memory Store (e.g., Redis)
Rate Limiter Service or Middleware
Configuration Management Service
Monitoring and Alerting System
Design Patterns
Token Bucket or Leaky Bucket algorithms
Fixed Window vs Sliding Window counters
Centralized vs Distributed rate limiting
Client-side vs Server-side enforcement
Circuit Breaker pattern for overload protection
Reference Architecture
Client
  |
  v
API Gateway / Edge Proxy (with Rate Limiter Middleware)
  |
  v
Microservices
  |
  v
Distributed Cache (Redis Cluster)
  |
  v
Configuration Service
  |
  v
Monitoring & Alerting
Components
API Gateway / Edge Proxy
Envoy, NGINX, or Kong
Intercept incoming requests, enforce rate limits before forwarding to microservices
Rate Limiter Middleware
Custom middleware or Envoy filter
Check and update request counts per user/key using distributed cache
Distributed Cache
Redis Cluster
Store counters and timestamps for rate limiting with low latency
Configuration Service
Central config store (e.g., Consul, etcd)
Manage rate limit rules and allow dynamic updates
Monitoring & Alerting
Prometheus + Grafana
Track rate limit usage, errors, and system health
Request Flow
1. Client sends request to API Gateway
2. API Gateway extracts user identity or API key
3. Rate Limiter Middleware queries Redis to get current count for user/key
4. If count is below limit, increment count and forward request to microservice
5. If count exceeds limit, respond with HTTP 429 Too Many Requests
6. Configuration Service provides rate limit rules to middleware dynamically
7. Monitoring system collects metrics on rate limiting events and system performance
Database Schema
Entities: - User or API Key: id, rate_limit_policy_id - Rate Limit Policy: id, max_requests, window_size_seconds - Counters stored in Redis as keys: "rate_limit:{user_id}:{window_start_timestamp}" with integer count Relationships: - Each User/API Key references one Rate Limit Policy - Counters are ephemeral and reset after window expires
Scaling Discussion
Bottlenecks
Redis becoming a single point of failure or performance bottleneck
API Gateway overload due to high request volume
Latency increase due to network calls to distributed cache
Configuration updates causing inconsistent rate limits across nodes
Solutions
Use Redis Cluster with sharding and replication for high availability and throughput
Deploy multiple API Gateway instances behind a load balancer
Use local caches with short TTLs to reduce Redis calls, accepting slight eventual consistency
Implement versioned configuration with atomic updates and cache invalidation
Interview Tips
Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing the architecture and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Explain different rate limiting algorithms and why you chose one
Discuss trade-offs between strict consistency and performance
Highlight how the design handles distributed microservices environment
Mention how dynamic configuration and monitoring improve operability
Address potential bottlenecks and scaling strategies

Practice

(1/5)
1. What is the main purpose of rate limiting in microservices?
easy
A. To control how many requests a user can make in a given time
B. To increase the speed of the service
C. To store user data securely
D. To balance the load between servers

Solution

  1. Step 1: Understand the concept of rate limiting

    Rate limiting is designed to restrict the number of requests a user or client can send to a service within a certain time frame.
  2. Step 2: Identify the main goal of rate limiting

    The main goal is to prevent overload and abuse by controlling request frequency, not to speed up services or store data.
  3. Final Answer:

    To control how many requests a user can make in a given time -> Option A
  4. Quick Check:

    Rate limiting = Control request count [OK]
Hint: Rate limiting limits request count per time [OK]
Common Mistakes:
  • Confusing rate limiting with load balancing
  • Thinking rate limiting speeds up the service
  • Mixing rate limiting with data storage
2. Which of the following is the correct way to represent a fixed window rate limiter allowing 100 requests per minute in pseudocode?
easy
A. if requests_in_last_minute < 100 then block else allow
B. if requests_in_last_hour > 100 then block else allow
C. if requests_in_last_minute > 100 then block else allow
D. if requests_in_last_second > 100 then allow else block

Solution

  1. Step 1: Understand fixed window rate limiting logic

    Fixed window rate limiting counts requests in a fixed time window (e.g., 1 minute) and blocks if the count exceeds the limit.
  2. Step 2: Match the correct condition for allowing or blocking

    If requests exceed 100 in the last minute, block; otherwise, allow. if requests_in_last_minute > 100 then block else allow matches this logic exactly.
  3. Final Answer:

    if requests_in_last_minute > 100 then block else allow -> Option C
  4. Quick Check:

    Fixed window limit = block if over limit [OK]
Hint: Block when requests exceed limit in fixed window [OK]
Common Mistakes:
  • Using wrong time window (hour instead of minute)
  • Reversing the condition (blocking when under limit)
  • Allowing requests when they should be blocked
3. Given this pseudocode for a token bucket rate limiter:
bucket_capacity = 5
refill_rate = 1 token per second
current_tokens = 3
request_tokens = 2
if current_tokens >= request_tokens:
    current_tokens -= request_tokens
    allow request
else:
    block request

What happens if a request for 4 tokens arrives immediately?
medium
A. Request is allowed and tokens reduce to -1
B. Request is blocked because refill rate is too low
C. Request is allowed and tokens reduce to 1
D. Request is blocked because not enough tokens

Solution

  1. Step 1: Check current tokens against requested tokens

    Current tokens are 3, request needs 4 tokens, which is more than available.
  2. Step 2: Determine if request is allowed or blocked

    Since current tokens (3) < request tokens (4), the request is blocked.
  3. Final Answer:

    Request is blocked because not enough tokens -> Option D
  4. Quick Check:

    Tokens < request = block [OK]
Hint: Allow only if tokens ≥ requested tokens [OK]
Common Mistakes:
  • Allowing request when tokens are insufficient
  • Ignoring token count and refill rate
  • Assuming tokens can go negative
4. A microservice uses a sliding window rate limiter but users report some requests are blocked even when they seem under the limit. Which is the most likely cause?
medium
A. The sliding window is not updating timestamps correctly
B. The service has too many servers without shared state
C. The rate limit is set too high
D. The users are sending requests too slowly

Solution

  1. Step 1: Understand sliding window rate limiter behavior

    Sliding window requires accurate tracking of request timestamps across all servers to count requests correctly.
  2. Step 2: Identify issue with multiple servers and no shared state

    If servers do not share state, each counts requests independently, causing incorrect blocking even if total requests are under limit.
  3. Final Answer:

    The service has too many servers without shared state -> Option B
  4. Quick Check:

    Multiple servers need shared state for sliding window [OK]
Hint: Sliding window needs shared state across servers [OK]
Common Mistakes:
  • Blaming slow user requests
  • Assuming rate limit is too high causes blocking
  • Ignoring distributed state issues
5. You design a rate limiter for a microservice that must handle 10 million users, each allowed 100 requests per hour. Which approach best balances accuracy and scalability?
hard
A. Use distributed token buckets with local caches and periodic sync
B. Use a centralized fixed window counter stored in a single database
C. Use client-side rate limiting without server checks
D. Use a sliding window log storing every request timestamp centrally

Solution

  1. Step 1: Analyze scalability needs for 10 million users

    A centralized database (Use a centralized fixed window counter stored in a single database) or storing every timestamp centrally (Use a sliding window log storing every request timestamp centrally) will cause bottlenecks and high latency.
  2. Step 2: Evaluate distributed token bucket with local caches

    Distributed token buckets with local caches reduce central load and sync periodically, balancing accuracy and scalability well.
  3. Step 3: Consider client-side rate limiting

    Client-side (Use client-side rate limiting without server checks) is unreliable as clients can bypass limits.
  4. Final Answer:

    Use distributed token buckets with local caches and periodic sync -> Option A
  5. Quick Check:

    Distributed token bucket = scalable + accurate [OK]
Hint: Distributed token buckets scale best for millions [OK]
Common Mistakes:
  • Choosing centralized storage causing bottlenecks
  • Relying only on client-side limits
  • Storing all request logs centrally causing overload