HLDsystem_design~10 mins

Why caching reduces latency in HLD - Scalability Evidence

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - Why caching reduces latency

Growth Table: Impact of Caching on Latency at Different Scales

Users	Without Cache	With Cache	Latency Impact
100	Low latency (~50ms)	Very low latency (~10ms)	Cache reduces latency by 80%
10,000	Moderate latency (~200ms)	Low latency (~30ms)	Cache reduces latency by 85%
1,000,000	High latency (~500ms)	Moderate latency (~50ms)	Cache reduces latency by 90%
100,000,000	Very high latency (seconds)	High latency (~200ms)	Cache reduces latency by 80-90%

First Bottleneck: Database and Network Latency

When users increase, the database becomes the first bottleneck because it takes longer to fetch data from disk or perform complex queries. Network delays add to this latency. Without caching, every request hits the database, causing slow responses.

Scaling Solutions: How Caching Reduces Latency

Cache Layer: Store frequently requested data in fast memory (like Redis or Memcached) to serve requests instantly.
Read Replicas: Use database replicas to distribute read load, but caching is faster.
CDN: Cache static content closer to users to reduce network delay.
Cache Invalidation: Keep cache updated to avoid stale data.
Horizontal Scaling: Add more cache servers to handle more requests.

Back-of-Envelope Cost Analysis

Assuming 1 million requests per second (RPS):

Without cache: Database must handle 1M QPS, likely overloaded (typical DB max ~10K QPS).
With cache: Cache handles 900K QPS (90% hit rate), database only 100K QPS.
Latency drops from ~500ms to ~50ms per request.
Network bandwidth reduced as fewer DB queries cross network.
Cache memory needed depends on data size; e.g., 100GB RAM for hot data.

Interview Tip: Structuring Your Scalability Discussion

Start by explaining the latency problem and why the database is the bottleneck. Then introduce caching as a solution to serve data faster from memory. Discuss cache hit rates, invalidation strategies, and how caching reduces load on the database and network. Finally, mention scaling cache horizontally and using CDNs for static content.

Self-Check Question

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Introduce a caching layer to serve frequent requests from memory, reducing load on the database and lowering latency.

Key Result

Caching reduces latency by serving frequent data from fast memory, preventing database overload and network delays as user traffic grows.