0
0
HLDsystem_design~10 mins

Cache stampede prevention in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Cache stampede prevention
Growth Table: Cache Stampede Prevention
Users / TrafficCache HitsCache MissesOrigin Server LoadCache Stampede RiskLatency Impact
100 usersHigh (95%+)LowLowNegligibleLow
10,000 usersHigh (90%+)ModerateModeratePossible on hot keysModerate
1,000,000 usersHigh (85%+)HighHighHigh risk on popular keysHigh
100,000,000 usersHigh (80%+)Very HighVery HighSevere risk, system overloadVery High
First Bottleneck

The first bottleneck is the origin server or database behind the cache. When many users request the same data that expires simultaneously, the cache misses spike. This causes a sudden surge of requests to the origin, overwhelming it. This is called a cache stampede. The origin server CPU, memory, or database connections become saturated first.

Scaling Solutions
  • Mutex Locking: Use locks so only one request fetches data from origin while others wait for cache refill.
  • Request Coalescing: Combine multiple requests for the same key into one origin fetch.
  • Early Expiration: Refresh cache before expiry to avoid many misses at once.
  • Randomized Expiry: Add random time to cache TTL to spread expirations.
  • Backup Cache: Serve stale data temporarily while refreshing cache.
  • Distributed Locks: Use Redis or similar for locks in multi-server setups.
  • Cache Warming: Preload popular keys before traffic spikes.
Back-of-Envelope Cost Analysis
  • At 1M users, assume 10% cache miss spike on a hot key: 100,000 origin requests in seconds.
  • Origin server handles ~5,000 QPS; 100,000 QPS burst overloads it 20x.
  • Network bandwidth: 1 Gbps (~125 MB/s) may saturate if responses are large.
  • Memory for locks and cache metadata is minimal compared to data size.
  • Implementing locks and randomized TTL adds negligible cost but prevents expensive origin overload.
Interview Tip

Start by explaining what a cache stampede is and why it happens. Then identify the origin server as the first bottleneck. Discuss simple solutions like locking and randomized TTL. Finally, mention distributed systems challenges and how to handle them. Keep answers structured: problem, bottleneck, solutions, trade-offs.

Self Check

Your database handles 1000 QPS. Traffic grows 10x with many users requesting the same cached key at once. What do you do first?

Answer: Implement cache stampede prevention by adding locking or request coalescing so only one request hits the database while others wait for cache refresh. This prevents overload and keeps latency low.

Key Result
Cache stampede causes origin server overload when many cache misses happen simultaneously. Prevent it by locking, request coalescing, and randomized cache expiry to spread load and keep the system stable.