Microservicessystem_design~25 mins

Rate limiting in Microservices - System Design Exercise

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Design: Rate Limiting System for Microservices

Design the rate limiting mechanism and its integration with microservices APIs. Out of scope: detailed API business logic, user authentication mechanisms.

Functional Requirements

FR1: Limit the number of requests a user or client can make to an API within a given time window

FR2: Support different rate limits for different users or API keys

FR3: Provide real-time feedback when limits are exceeded

FR4: Ensure rate limiting works correctly in a distributed microservices environment

FR5: Allow configuration changes without downtime

Non-Functional Requirements

NFR1: Handle up to 100,000 requests per second across all services

NFR2: Enforce limits with p99 latency under 50ms

NFR3: Achieve 99.9% availability

NFR4: Support horizontal scaling of microservices

NFR5: Avoid single points of failure

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

❓ Question 6

Key Components

API Gateway or Edge Proxy

Distributed Cache or In-memory Store (e.g., Redis)

Rate Limiter Service or Middleware

Configuration Management Service

Monitoring and Alerting System

Design Patterns

Token Bucket or Leaky Bucket algorithms

Fixed Window vs Sliding Window counters

Centralized vs Distributed rate limiting

Client-side vs Server-side enforcement

Circuit Breaker pattern for overload protection

Reference Architecture

Client
  |
  v
API Gateway / Edge Proxy (with Rate Limiter Middleware)
  |
  v
Microservices
  |
  v
Distributed Cache (Redis Cluster)
  |
  v
Configuration Service
  |
  v
Monitoring & Alerting

Components

API Gateway / Edge Proxy

Envoy, NGINX, or Kong

Intercept incoming requests, enforce rate limits before forwarding to microservices

Rate Limiter Middleware

Custom middleware or Envoy filter

Check and update request counts per user/key using distributed cache

Distributed Cache

Redis Cluster

Store counters and timestamps for rate limiting with low latency

Configuration Service

Central config store (e.g., Consul, etcd)

Manage rate limit rules and allow dynamic updates

Monitoring & Alerting

Prometheus + Grafana

Track rate limit usage, errors, and system health

Request Flow

1. Client sends request to API Gateway

2. API Gateway extracts user identity or API key

3. Rate Limiter Middleware queries Redis to get current count for user/key

4. If count is below limit, increment count and forward request to microservice

5. If count exceeds limit, respond with HTTP 429 Too Many Requests

6. Configuration Service provides rate limit rules to middleware dynamically

7. Monitoring system collects metrics on rate limiting events and system performance

Database Schema

Entities: - User or API Key: id, rate_limit_policy_id - Rate Limit Policy: id, max_requests, window_size_seconds - Counters stored in Redis as keys: "rate_limit:{user_id}:{window_start_timestamp}" with integer count Relationships: - Each User/API Key references one Rate Limit Policy - Counters are ephemeral and reset after window expires

Scaling Discussion

Bottlenecks

Redis becoming a single point of failure or performance bottleneck

API Gateway overload due to high request volume

Latency increase due to network calls to distributed cache

Configuration updates causing inconsistent rate limits across nodes

Solutions

Use Redis Cluster with sharding and replication for high availability and throughput

Deploy multiple API Gateway instances behind a load balancer

Use local caches with short TTLs to reduce Redis calls, accepting slight eventual consistency

Implement versioned configuration with atomic updates and cache invalidation

Interview Tips

Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing the architecture and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.

Explain different rate limiting algorithms and why you chose one

Discuss trade-offs between strict consistency and performance

Highlight how the design handles distributed microservices environment

Mention how dynamic configuration and monitoring improve operability

Address potential bottlenecks and scaling strategies

Practice

(1/5)

1. What is the main purpose of rate limiting in microservices?

easy

A. To control how many requests a user can make in a given time

B. To increase the speed of the service

C. To store user data securely

D. To balance the load between servers

Rate limiting in Microservices - System Design Exercise

Start learning this pattern below

Practice

Solution

Step 1: Understand the concept of rate limiting

Step 2: Identify the main goal of rate limiting

Final Answer:

Quick Check:

Solution

Step 1: Understand fixed window rate limiting logic

Step 2: Match the correct condition for allowing or blocking

Final Answer:

Quick Check:

Solution

Step 1: Check current tokens against requested tokens

Step 2: Determine if request is allowed or blocked

Final Answer:

Quick Check:

Solution

Step 1: Understand sliding window rate limiter behavior

Step 2: Identify issue with multiple servers and no shared state

Final Answer:

Quick Check:

Solution

Step 1: Analyze scalability needs for 10 million users

Step 2: Evaluate distributed token bucket with local caches

Step 3: Consider client-side rate limiting

Final Answer:

Quick Check: