| Users | Without Cache | With Cache | Latency Impact |
|---|---|---|---|
| 100 | Low latency (~50ms) | Very low latency (~10ms) | Cache reduces latency by 80% |
| 10,000 | Moderate latency (~200ms) | Low latency (~30ms) | Cache reduces latency by 85% |
| 1,000,000 | High latency (~500ms) | Moderate latency (~50ms) | Cache reduces latency by 90% |
| 100,000,000 | Very high latency (seconds) | High latency (~200ms) | Cache reduces latency by 80-90% |
Why caching reduces latency in HLD - Scalability Evidence
When users increase, the database becomes the first bottleneck because it takes longer to fetch data from disk or perform complex queries. Network delays add to this latency. Without caching, every request hits the database, causing slow responses.
- Cache Layer: Store frequently requested data in fast memory (like Redis or Memcached) to serve requests instantly.
- Read Replicas: Use database replicas to distribute read load, but caching is faster.
- CDN: Cache static content closer to users to reduce network delay.
- Cache Invalidation: Keep cache updated to avoid stale data.
- Horizontal Scaling: Add more cache servers to handle more requests.
Assuming 1 million requests per second (RPS):
- Without cache: Database must handle 1M QPS, likely overloaded (typical DB max ~10K QPS).
- With cache: Cache handles 900K QPS (90% hit rate), database only 100K QPS.
- Latency drops from ~500ms to ~50ms per request.
- Network bandwidth reduced as fewer DB queries cross network.
- Cache memory needed depends on data size; e.g., 100GB RAM for hot data.
Start by explaining the latency problem and why the database is the bottleneck. Then introduce caching as a solution to serve data faster from memory. Discuss cache hit rates, invalidation strategies, and how caching reduces load on the database and network. Finally, mention scaling cache horizontally and using CDNs for static content.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Introduce a caching layer to serve frequent requests from memory, reducing load on the database and lowering latency.