Microservicessystem_design~7 mins

Rate limiting in Microservices - System Design Guide

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Problem Statement

When too many requests hit a service at once, it can slow down or crash, causing poor user experience and downtime. Without control, abusive or accidental traffic spikes can overwhelm resources and break service availability.

Solution

Rate limiting controls how many requests a user or client can make in a given time window. It rejects or delays excess requests to keep the system stable and fair for all users.

Architecture

Client

→Rate Limiter

↓

Request Count

This diagram shows a client sending requests through a rate limiter before reaching the service. The rate limiter tracks request counts in a cache to enforce limits.

Trade-offs

✓ Pros

→

Prevents service overload by controlling request rates.

→

Protects against abuse and denial-of-service attacks.

→

Ensures fair resource usage among users.

→

Improves system stability and availability.

✗ Cons

→

Adds latency due to extra processing before requests reach the service.

→

Complexity in managing distributed rate limits across multiple servers.

→

Risk of blocking legitimate traffic if limits are too strict.

Use when your service faces unpredictable or high traffic spikes, especially above 1000 requests per second, or when protecting critical resources from abuse.

Avoid if your system handles very low traffic (under 100 requests per second) where rate limiting overhead outweighs benefits, or if all clients are trusted and well-behaved.

Real World Examples

Twitter

Twitter applies rate limiting on its API endpoints to prevent abuse and ensure fair access for all developers.

Stripe

Stripe uses rate limiting to protect payment APIs from excessive calls that could cause service disruption or fraud.

Amazon

Amazon API Gateway enforces rate limits to maintain backend service stability during traffic surges.

Code Example

The before code allows unlimited requests, risking overload. The after code implements a token bucket algorithm per client IP, allowing a fixed number of requests per time window and rejecting excess requests with a 429 error.

Microservices

### Before (no rate limiting)
from flask import Flask, request
app = Flask(__name__)

@app.route('/api')
def api():
    return 'Success'


### After (with simple token bucket rate limiting)
import time
from flask import Flask, request, jsonify
app = Flask(__name__)

RATE_LIMIT = 5  # requests
TIME_WINDOW = 10  # seconds

clients = {}

@app.route('/api')
def api():
    client_ip = request.remote_addr
    now = time.time()
    if client_ip not in clients:
        clients[client_ip] = {'tokens': RATE_LIMIT, 'last': now}
    elapsed = now - clients[client_ip]['last']
    clients[client_ip]['tokens'] += elapsed * (RATE_LIMIT / TIME_WINDOW)
    if clients[client_ip]['tokens'] > RATE_LIMIT:
        clients[client_ip]['tokens'] = RATE_LIMIT
    clients[client_ip]['last'] = now

    if clients[client_ip]['tokens'] < 1:
        return jsonify({'error': 'Rate limit exceeded'}), 429
    else:
        clients[client_ip]['tokens'] -= 1
        return 'Success'

OutputSuccess

Alternatives

Circuit Breaker

Circuit breaker stops requests after failures to prevent cascading errors, while rate limiting controls request volume regardless of failures.

Use when: Use circuit breaker when backend failures are frequent and you want to fail fast, not just limit traffic.

Load Balancing

Load balancing distributes traffic evenly across servers, but does not limit total request volume per client.

Use when: Use load balancing to scale horizontally, combined with rate limiting to control per-client usage.

Summary

Rate limiting prevents system overload by controlling request rates per client.

It protects services from abuse and ensures fair resource usage.

Implementing rate limiting improves stability but adds complexity and potential latency.

Practice

(1/5)

1. What is the main purpose of rate limiting in microservices?

easy

A. To control how many requests a user can make in a given time

B. To increase the speed of the service

C. To store user data securely

D. To balance the load between servers

Rate limiting in Microservices - System Design Guide

Start learning this pattern below

Practice

Solution

Step 1: Understand the concept of rate limiting

Step 2: Identify the main goal of rate limiting

Final Answer:

Quick Check:

Solution

Step 1: Understand fixed window rate limiting logic

Step 2: Match the correct condition for allowing or blocking

Final Answer:

Quick Check:

Solution

Step 1: Check current tokens against requested tokens

Step 2: Determine if request is allowed or blocked

Final Answer:

Quick Check:

Solution

Step 1: Understand sliding window rate limiter behavior

Step 2: Identify issue with multiple servers and no shared state

Final Answer:

Quick Check:

Solution

Step 1: Analyze scalability needs for 10 million users

Step 2: Evaluate distributed token bucket with local caches

Step 3: Consider client-side rate limiting

Final Answer:

Quick Check: