HLDsystem_design~25 mins

Design a key-value store in HLD - System Design Exercise

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Design: Key-Value Store

Design covers core key-value storage, data replication, and client API. Does not cover advanced features like transactions, complex queries, or multi-region geo-distribution.

Functional Requirements

FR1: Store and retrieve data as key-value pairs

FR2: Support basic operations: put(key, value), get(key), delete(key)

FR3: Handle up to 1 million keys

FR4: Provide low latency for read and write operations (p99 < 50ms)

FR5: Ensure data durability and availability

FR6: Support concurrent access from multiple clients

Non-Functional Requirements

NFR1: System should be highly available with 99.9% uptime

NFR2: Data consistency should be eventual consistency

NFR3: Storage should scale horizontally

NFR4: Latency target: p99 < 50ms for reads and writes

NFR5: Support up to 1000 concurrent clients

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

❓ Question 6

Key Components

Client API layer

Storage engine (in-memory or disk-based)

Indexing mechanism for fast key lookup

Replication module for data durability

Cache layer for hot keys

Load balancer or proxy for request distribution

Design Patterns

Sharding to distribute keys across nodes

Replication for fault tolerance

Consistent hashing for key distribution

Write-ahead logging for durability

Cache aside pattern for read optimization

Reference Architecture

Client(s)
   |
Load Balancer / Proxy
   |
+-------------------------+
|       KV Store Cluster   |
| +-------+  +-------+     |
| | Node1 |  | Node2 | ... |
| +-------+  +-------+     |
+-------------------------+
   |
Persistent Storage (Disk)

Components

Client API

REST/gRPC

Interface for clients to perform put, get, delete operations

Load Balancer / Proxy

Nginx or custom proxy

Distributes client requests to KV nodes based on consistent hashing

KV Store Nodes

In-memory store with disk persistence (e.g., RocksDB)

Store key-value pairs, handle requests, replicate data

Replication Module

Asynchronous replication protocol

Replicates data to other nodes for durability and availability

Persistent Storage

SSD-backed storage

Durable storage of data with write-ahead logging

Request Flow

1. Client sends put/get/delete request to Load Balancer

2. Load Balancer uses consistent hashing on key to select KV node

3. KV node processes request:

4. - For put: store key-value in memory and write-ahead log, then replicate asynchronously

5. - For get: check in-memory store and return value

6. - For delete: remove key from store and replicate deletion

7. KV node acknowledges client after local write (for put/delete) or returns value (for get)

8. Replication module sends updates to replica nodes asynchronously

9. Replica nodes apply updates to maintain eventual consistency

Database Schema

Entity: KeyValue Attributes: key (string, primary), value (blob), timestamp (for versioning) Relationships: None (flat key-value pairs) Indexes: Primary index on key for fast lookup

Scaling Discussion

Bottlenecks

Single node storage capacity limits total data size

Load balancer can become a bottleneck under high concurrency

Replication lag can cause stale reads

Disk I/O limits write throughput

Memory limits in-memory cache size

Solutions

Use sharding with consistent hashing to distribute keys across multiple nodes

Deploy multiple load balancers with DNS round-robin or anycast

Implement quorum-based reads/writes for stronger consistency if needed

Use SSDs and optimize write-ahead logging for faster disk writes

Implement cache eviction policies and tiered storage to manage memory

Interview Tips

Time: Spend 10 minutes clarifying requirements and constraints, 15 minutes designing architecture and data flow, 10 minutes discussing scaling and trade-offs, 10 minutes for questions and wrap-up.

Clarify consistency and durability requirements early

Explain choice of consistent hashing for key distribution

Discuss replication strategy and its impact on availability

Highlight how latency targets influence design choices

Mention trade-offs between strong and eventual consistency

Address scaling challenges and mitigation strategies