HLDsystem_design~10 mins

Cache stampede prevention in HLD - Scalability & System Analysis

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - Cache stampede prevention

Growth Table: Cache Stampede Prevention

Users / Traffic	Cache Hits	Cache Misses	Origin Server Load	Cache Stampede Risk	Latency Impact
100 users	High (95%+)	Low	Low	Negligible	Low
10,000 users	High (90%+)	Moderate	Moderate	Possible on hot keys	Moderate
1,000,000 users	High (85%+)	High	High	High risk on popular keys	High
100,000,000 users	High (80%+)	Very High	Very High	Severe risk, system overload	Very High

First Bottleneck

The first bottleneck is the origin server or database behind the cache. When many users request the same data that expires simultaneously, the cache misses spike. This causes a sudden surge of requests to the origin, overwhelming it. This is called a cache stampede. The origin server CPU, memory, or database connections become saturated first.

Scaling Solutions

Mutex Locking: Use locks so only one request fetches data from origin while others wait for cache refill.
Request Coalescing: Combine multiple requests for the same key into one origin fetch.
Early Expiration: Refresh cache before expiry to avoid many misses at once.
Randomized Expiry: Add random time to cache TTL to spread expirations.
Backup Cache: Serve stale data temporarily while refreshing cache.
Distributed Locks: Use Redis or similar for locks in multi-server setups.
Cache Warming: Preload popular keys before traffic spikes.

Back-of-Envelope Cost Analysis

At 1M users, assume 10% cache miss spike on a hot key: 100,000 origin requests in seconds.
Origin server handles ~5,000 QPS; 100,000 QPS burst overloads it 20x.
Network bandwidth: 1 Gbps (~125 MB/s) may saturate if responses are large.
Memory for locks and cache metadata is minimal compared to data size.
Implementing locks and randomized TTL adds negligible cost but prevents expensive origin overload.

Interview Tip

Start by explaining what a cache stampede is and why it happens. Then identify the origin server as the first bottleneck. Discuss simple solutions like locking and randomized TTL. Finally, mention distributed systems challenges and how to handle them. Keep answers structured: problem, bottleneck, solutions, trade-offs.

Self Check

Your database handles 1000 QPS. Traffic grows 10x with many users requesting the same cached key at once. What do you do first?

Answer: Implement cache stampede prevention by adding locking or request coalescing so only one request hits the database while others wait for cache refresh. This prevents overload and keeps latency low.

Key Result

Cache stampede causes origin server overload when many cache misses happen simultaneously. Prevent it by locking, request coalescing, and randomized cache expiry to spread load and keep the system stable.