Bird
Raised Fist0
HLDsystem_design~25 mins

Design a key-value store in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Key-Value Store
Design covers core key-value storage, data replication, and client API. Does not cover advanced features like transactions, complex queries, or multi-region geo-distribution.
Functional Requirements
FR1: Store and retrieve data as key-value pairs
FR2: Support basic operations: put(key, value), get(key), delete(key)
FR3: Handle up to 1 million keys
FR4: Provide low latency for read and write operations (p99 < 50ms)
FR5: Ensure data durability and availability
FR6: Support concurrent access from multiple clients
Non-Functional Requirements
NFR1: System should be highly available with 99.9% uptime
NFR2: Data consistency should be eventual consistency
NFR3: Storage should scale horizontally
NFR4: Latency target: p99 < 50ms for reads and writes
NFR5: Support up to 1000 concurrent clients
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
Client API layer
Storage engine (in-memory or disk-based)
Indexing mechanism for fast key lookup
Replication module for data durability
Cache layer for hot keys
Load balancer or proxy for request distribution
Design Patterns
Sharding to distribute keys across nodes
Replication for fault tolerance
Consistent hashing for key distribution
Write-ahead logging for durability
Cache aside pattern for read optimization
Reference Architecture
Client(s)
   |
Load Balancer / Proxy
   |
+-------------------------+
|       KV Store Cluster   |
| +-------+  +-------+     |
| | Node1 |  | Node2 | ... |
| +-------+  +-------+     |
+-------------------------+
   |
Persistent Storage (Disk)
Components
Client API
REST/gRPC
Interface for clients to perform put, get, delete operations
Load Balancer / Proxy
Nginx or custom proxy
Distributes client requests to KV nodes based on consistent hashing
KV Store Nodes
In-memory store with disk persistence (e.g., RocksDB)
Store key-value pairs, handle requests, replicate data
Replication Module
Asynchronous replication protocol
Replicates data to other nodes for durability and availability
Persistent Storage
SSD-backed storage
Durable storage of data with write-ahead logging
Request Flow
1. Client sends put/get/delete request to Load Balancer
2. Load Balancer uses consistent hashing on key to select KV node
3. KV node processes request:
4. - For put: store key-value in memory and write-ahead log, then replicate asynchronously
5. - For get: check in-memory store and return value
6. - For delete: remove key from store and replicate deletion
7. KV node acknowledges client after local write (for put/delete) or returns value (for get)
8. Replication module sends updates to replica nodes asynchronously
9. Replica nodes apply updates to maintain eventual consistency
Database Schema
Entity: KeyValue Attributes: key (string, primary), value (blob), timestamp (for versioning) Relationships: None (flat key-value pairs) Indexes: Primary index on key for fast lookup
Scaling Discussion
Bottlenecks
Single node storage capacity limits total data size
Load balancer can become a bottleneck under high concurrency
Replication lag can cause stale reads
Disk I/O limits write throughput
Memory limits in-memory cache size
Solutions
Use sharding with consistent hashing to distribute keys across multiple nodes
Deploy multiple load balancers with DNS round-robin or anycast
Implement quorum-based reads/writes for stronger consistency if needed
Use SSDs and optimize write-ahead logging for faster disk writes
Implement cache eviction policies and tiered storage to manage memory
Interview Tips
Time: Spend 10 minutes clarifying requirements and constraints, 15 minutes designing architecture and data flow, 10 minutes discussing scaling and trade-offs, 10 minutes for questions and wrap-up.
Clarify consistency and durability requirements early
Explain choice of consistent hashing for key distribution
Discuss replication strategy and its impact on availability
Highlight how latency targets influence design choices
Mention trade-offs between strong and eventual consistency
Address scaling challenges and mitigation strategies