| Users / Traffic | Cache Hits | Cache Misses | Origin Server Load | Cache Stampede Risk | Latency Impact |
|---|---|---|---|---|---|
| 100 users | High (95%+) | Low | Low | Negligible | Low |
| 10,000 users | High (90%+) | Moderate | Moderate | Possible on hot keys | Moderate |
| 1,000,000 users | High (85%+) | High | High | High risk on popular keys | High |
| 100,000,000 users | High (80%+) | Very High | Very High | Severe risk, system overload | Very High |
Cache stampede prevention in HLD - Scalability & System Analysis
The first bottleneck is the origin server or database behind the cache. When many users request the same data that expires simultaneously, the cache misses spike. This causes a sudden surge of requests to the origin, overwhelming it. This is called a cache stampede. The origin server CPU, memory, or database connections become saturated first.
- Mutex Locking: Use locks so only one request fetches data from origin while others wait for cache refill.
- Request Coalescing: Combine multiple requests for the same key into one origin fetch.
- Early Expiration: Refresh cache before expiry to avoid many misses at once.
- Randomized Expiry: Add random time to cache TTL to spread expirations.
- Backup Cache: Serve stale data temporarily while refreshing cache.
- Distributed Locks: Use Redis or similar for locks in multi-server setups.
- Cache Warming: Preload popular keys before traffic spikes.
- At 1M users, assume 10% cache miss spike on a hot key: 100,000 origin requests in seconds.
- Origin server handles ~5,000 QPS; 100,000 QPS burst overloads it 20x.
- Network bandwidth: 1 Gbps (~125 MB/s) may saturate if responses are large.
- Memory for locks and cache metadata is minimal compared to data size.
- Implementing locks and randomized TTL adds negligible cost but prevents expensive origin overload.
Start by explaining what a cache stampede is and why it happens. Then identify the origin server as the first bottleneck. Discuss simple solutions like locking and randomized TTL. Finally, mention distributed systems challenges and how to handle them. Keep answers structured: problem, bottleneck, solutions, trade-offs.
Your database handles 1000 QPS. Traffic grows 10x with many users requesting the same cached key at once. What do you do first?
Answer: Implement cache stampede prevention by adding locking or request coalescing so only one request hits the database while others wait for cache refresh. This prevents overload and keeps latency low.