0
0
HLDsystem_design~25 mins

Cache stampede prevention in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Cache Stampede Prevention System
Design focuses on cache layer and backend interaction to prevent stampede. Out of scope: detailed backend business logic, frontend UI design.
Functional Requirements
FR1: Prevent multiple requests from overwhelming the backend when cache expires
FR2: Serve cached data with low latency for read-heavy workloads
FR3: Ensure data consistency between cache and backend
FR4: Handle sudden spikes in traffic gracefully
FR5: Support cache expiration and refresh without blocking user requests
Non-Functional Requirements
NFR1: Support up to 50,000 concurrent users
NFR2: API response latency p99 under 150ms
NFR3: Availability target of 99.9% uptime
NFR4: Cache expiration time configurable between 1 minute to 1 hour
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
Key Components
Cache storage (Redis or Memcached)
Backend data source (database or API)
Locking mechanism (distributed locks or mutex)
Request queue or token bucket for throttling
Cache refresh logic (lazy or proactive)
Design Patterns
Cache-aside pattern
Mutex lock or distributed lock for cache rebuild
Request coalescing to combine multiple cache misses
Early recomputation (refresh before expiry)
Probabilistic early expiration
Reference Architecture
Application Servers
Cache Layer (Redis)
Backend Database
Components
Cache Layer
Redis
Store cached data with expiration and support atomic operations for locking
Distributed Lock Service
Redis Redlock or Zookeeper
Coordinate locks to allow only one process to rebuild cache at a time
Backend Database
Relational or NoSQL DB
Source of truth for data to refresh cache
Application Servers
Stateless servers
Serve client requests, check cache, acquire lock if needed, rebuild cache
Request Flow
1. Client sends request to Application Server
2. Application Server checks Redis cache for data
3. If cache hit, return data immediately
4. If cache miss or expired, Application Server tries to acquire distributed lock
5. If lock acquired, Application Server fetches fresh data from Backend Database
6. Application Server updates cache with fresh data and releases lock
7. If lock not acquired, Application Server waits or returns stale cache data if available
8. Client receives data with minimal delay and backend is protected from overload
Database Schema
Entity: DataItem { id (PK), value, last_updated } Relationships: None needed for cache stampede prevention; focus is on cache and locking coordination.
Scaling Discussion
Bottlenecks
Distributed lock contention under very high cache miss rates
Backend database overload if many cache rebuilds happen simultaneously
Cache storage limits and eviction policies under heavy load
Network latency between application servers and cache or lock service
Solutions
Use efficient distributed lock algorithms like Redlock with timeouts
Implement request coalescing to batch cache rebuild requests
Use early recomputation to refresh cache before expiry to reduce misses
Scale backend database with read replicas and caching layers
Partition cache keys to reduce lock contention
Use local caches with TTL to reduce load on central cache
Interview Tips
Time: Spend 10 minutes understanding requirements and clarifying constraints, 20 minutes designing the architecture and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Explain the cache stampede problem clearly with real-life analogy (e.g., many people trying to enter a store after it opens)
Describe how distributed locks prevent multiple cache rebuilds
Discuss trade-offs between latency and data freshness
Mention fallback strategies like serving stale data
Highlight scaling challenges and mitigation techniques
Show awareness of failure modes like lock failures or cache inconsistencies