HLDsystem_design~7 mins

Design a rate limiter in HLD - System Design Guide

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Problem Statement

When a service receives too many requests in a short time, it can become overwhelmed, causing slow responses or crashes. Without control, a few users or faulty clients can consume all resources, making the system unavailable for others.

Solution

A rate limiter controls how many requests a user or client can make in a given time window. It tracks request counts and blocks or delays requests that exceed the allowed limit, protecting the system from overload and ensuring fair usage.

Architecture

Client 1

→Rate Limiter

↓

→Storage

This diagram shows clients sending requests to a rate limiter, which checks request counts stored in a storage system before forwarding allowed requests to the backend.

Trade-offs

✓ Pros

→

Prevents system overload by limiting request rates.

→

Ensures fair resource usage among users.

→

Protects backend services from spikes and abuse.

→

Can improve overall system stability and user experience.

✗ Cons

→

Adds latency due to request checking.

→

Requires additional storage and synchronization for counters.

→

Complexity increases with distributed systems needing consistent state.

Use when your system faces high traffic with potential bursts or abuse, typically above hundreds or thousands of requests per second, or when backend stability is critical.

Avoid if your system has very low traffic (under 100 requests per second) or if strict request limits would harm user experience more than occasional overload.

Real World Examples

Amazon

Amazon uses rate limiting on its APIs to prevent excessive calls from a single client that could degrade service for others.

Twitter

Twitter applies rate limiting to control how many tweets or API requests a user can make in a time window to prevent spam and abuse.

Stripe

Stripe enforces rate limits on payment API calls to protect backend systems from sudden spikes and ensure transaction reliability.

Alternatives

Circuit Breaker

Stops requests temporarily after failures rather than limiting request rate.

Use when: Choose when backend failures or errors are the main concern, not request volume.

Load Balancing

Distributes requests evenly across servers but does not limit request rate per client.

Use when: Choose when you want to spread load but do not need to restrict client request frequency.

Backpressure

Slows down request processing dynamically rather than outright rejecting excess requests.

Use when: Choose when you want to degrade service gracefully instead of hard blocking.

Summary

Rate limiting prevents system overload by controlling request frequency per client.

It ensures fair usage and protects backend services from spikes and abuse.

Implementing rate limiting improves system stability and user experience under high load.