Bird
Raised Fist0
HLDsystem_design~7 mins

Design a rate limiter in HLD - System Design Guide

Choose your learning style9 modes available
Problem Statement
When a service receives too many requests in a short time, it can become overwhelmed, causing slow responses or crashes. Without control, a few users or faulty clients can consume all resources, making the system unavailable for others.
Solution
A rate limiter controls how many requests a user or client can make in a given time window. It tracks request counts and blocks or delays requests that exceed the allowed limit, protecting the system from overload and ensuring fair usage.
Architecture
Client 1
Rate Limiter
Storage

This diagram shows clients sending requests to a rate limiter, which checks request counts stored in a storage system before forwarding allowed requests to the backend.

Trade-offs
✓ Pros
Prevents system overload by limiting request rates.
Ensures fair resource usage among users.
Protects backend services from spikes and abuse.
Can improve overall system stability and user experience.
✗ Cons
Adds latency due to request checking.
Requires additional storage and synchronization for counters.
Complexity increases with distributed systems needing consistent state.
Use when your system faces high traffic with potential bursts or abuse, typically above hundreds or thousands of requests per second, or when backend stability is critical.
Avoid if your system has very low traffic (under 100 requests per second) or if strict request limits would harm user experience more than occasional overload.
Real World Examples
Amazon
Amazon uses rate limiting on its APIs to prevent excessive calls from a single client that could degrade service for others.
Twitter
Twitter applies rate limiting to control how many tweets or API requests a user can make in a time window to prevent spam and abuse.
Stripe
Stripe enforces rate limits on payment API calls to protect backend systems from sudden spikes and ensure transaction reliability.
Alternatives
Circuit Breaker
Stops requests temporarily after failures rather than limiting request rate.
Use when: Choose when backend failures or errors are the main concern, not request volume.
Load Balancing
Distributes requests evenly across servers but does not limit request rate per client.
Use when: Choose when you want to spread load but do not need to restrict client request frequency.
Backpressure
Slows down request processing dynamically rather than outright rejecting excess requests.
Use when: Choose when you want to degrade service gracefully instead of hard blocking.
Summary
Rate limiting prevents system overload by controlling request frequency per client.
It ensures fair usage and protects backend services from spikes and abuse.
Implementing rate limiting improves system stability and user experience under high load.