0
0
HLDsystem_design~25 mins

Why caching reduces latency in HLD - Design It to Understand It

Choose your learning style9 modes available
Design: Caching System to Reduce Latency
Focus on caching layer design and its impact on latency reduction. Out of scope: detailed cache eviction policies and cache consistency mechanisms.
Functional Requirements
FR1: Serve user requests with minimal delay
FR2: Reduce load on primary data storage
FR3: Provide quick access to frequently requested data
Non-Functional Requirements
NFR1: Handle up to 10,000 requests per second
NFR2: API response time p99 under 100ms
NFR3: System availability 99.9%
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
Key Components
Cache storage (in-memory store like Redis or Memcached)
Primary database or data source
Application server
Cache invalidation or expiration mechanism
Design Patterns
Cache-aside pattern
Write-through and write-back caching
Time-to-live (TTL) based expiration
Lazy loading and prefetching
Reference Architecture
Client
  |
  v
Application Server
  |
  v
+------------------+
|   Cache Layer    |
| (Redis/Memcached) |
+------------------+
  |
  v
Primary Database
Components
Client
Web or Mobile App
Sends requests to the application server
Application Server
Node.js / Java / Python
Handles client requests, checks cache before database
Cache Layer
Redis or Memcached
Stores frequently accessed data in memory for fast retrieval
Primary Database
PostgreSQL / MySQL / NoSQL
Stores the authoritative data
Request Flow
1. Client sends a request to the application server.
2. Application server checks the cache for requested data.
3. If data is found in cache (cache hit), return data immediately to client.
4. If data is not found (cache miss), query the primary database.
5. Store the retrieved data in cache for future requests.
6. Return data to client.
Database Schema
Not applicable as this design focuses on caching mechanism rather than database schema.
Scaling Discussion
Bottlenecks
Cache becoming a single point of failure or bottleneck under high load.
Cache misses causing increased load on primary database.
Stale data in cache leading to inconsistent responses.
Solutions
Use distributed cache clusters with replication and sharding to handle load and provide high availability.
Implement cache warming and prefetching to reduce cache misses.
Use TTL and cache invalidation strategies to keep data fresh.
Interview Tips
Time: Spend 10 minutes explaining caching basics and why it reduces latency, 15 minutes on architecture and data flow, 10 minutes on scaling and trade-offs, and 10 minutes for questions.
Explain latency as the delay in getting data from the source.
Describe how cache stores data closer to the application for faster access.
Discuss cache hit vs cache miss and their impact on latency.
Mention cache placement and eviction strategies briefly.
Highlight scaling challenges and solutions like distributed caching.