0
0
HLDsystem_design~10 mins

Distributed caching (Redis, Memcached) in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Distributed caching (Redis, Memcached)
Growth Table: Distributed Caching
Users / TrafficCache SizeCache Hit RateLatencyCache NodesEviction Policy
100 usersSmall (MBs)High (~90%)Low (~1ms)1 nodeLRU or TTL
10K usersMedium (GBs)High (~85-90%)Low (~1-2ms)3-5 nodesLRU + TTL
1M usersLarge (10s GBs)Moderate (~80%)Low (~2-3ms)10-20 nodes, shardedLRU + TTL + LFU
100M usersVery Large (100s GBs to TBs)Moderate (~75-80%)Low (~3-5ms)50+ nodes, sharded + replicationAdvanced eviction + tiered caching
First Bottleneck

At small scale, the cache easily handles requests with a single node. As traffic grows beyond 10K users, the first bottleneck is the cache node memory and CPU. A single Redis or Memcached instance can handle around 100K operations per second but limited memory (~tens of GBs). When cache size or QPS grows beyond this, the node becomes CPU or memory bound.

Also, network bandwidth between app servers and cache nodes can become a bottleneck at very high QPS (millions of ops/sec).

Scaling Solutions
  • Horizontal scaling: Add more cache nodes and shard keys across them to distribute load and memory usage.
  • Replication: Use read replicas for read-heavy workloads to increase throughput and availability.
  • Eviction policies: Use LRU, LFU, TTL to keep cache size manageable and remove stale data.
  • Client-side sharding: Distribute keys at the client or proxy layer to avoid single node overload.
  • Tiered caching: Combine in-memory cache with local caches or slower caches for large datasets.
  • Network optimization: Use fast networks and colocate cache nodes near app servers to reduce latency.
  • Monitoring and autoscaling: Track cache hit rates, latency, and resource usage to scale nodes dynamically.
Back-of-Envelope Cost Analysis
  • At 1M users, assume 10 requests per second per user -> 10M QPS total.
  • Redis single node handles ~100K ops/sec -> need ~100 nodes for caching all requests.
  • Cache size depends on data size; e.g., 1MB per user cached data -> 1TB total cache needed.
  • Network bandwidth: 100 nodes x 1 Gbps each -> 100 Gbps total bandwidth needed.
  • Storage cost: RAM is expensive; consider cost vs benefit of caching vs DB load.
Interview Tip

Start by explaining what distributed caching is and why it helps reduce database load and latency.

Discuss scaling limits of a single cache node (memory, CPU, network).

Explain horizontal scaling with sharding and replication.

Mention eviction policies and cache consistency challenges.

Use numbers to justify when to add nodes or optimize network.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add a distributed cache layer (Redis/Memcached) to offload read requests from the database, reducing DB load and latency. Then monitor cache hit rate and scale cache nodes as needed.

Key Result
Distributed caching scales well by adding more cache nodes and sharding data, but memory and CPU limits per node require horizontal scaling and eviction policies as traffic grows beyond 100K ops/sec per node.