| Users / Requests | Number of Nodes | Data Distribution | Node Changes Impact | Load Balancing |
|---|---|---|---|---|
| 100 users | 3-5 nodes | Simple hashing works, consistent hashing optional | Minimal impact, rehashing simple | Basic round-robin or hashing |
| 10K users | 10-50 nodes | Consistent hashing reduces data movement on node changes | Small portion of keys remapped | Virtual nodes improve balance |
| 1M users | 100-500 nodes | Data distribution critical, consistent hashing essential | Node failure causes minimal remapping (~1/n keys) | Use virtual nodes and replication |
| 100M users | 1000+ nodes | Highly dynamic cluster, consistent hashing with replication and multi-ring | Efficient scaling, minimal data movement | Advanced load balancing, monitoring, and auto-scaling |
Consistent hashing in HLD - Scalability & System Analysis
At small scale, the first bottleneck is uneven data distribution causing some nodes to be overloaded.
As the system grows, the bottleneck shifts to the cost of rehashing and data movement when nodes join or leave.
Without consistent hashing, large-scale systems suffer from massive data reshuffling, causing downtime and performance drops.
- Consistent Hashing: Minimizes data movement by mapping keys and nodes on a hash ring.
- Virtual Nodes: Each physical node hosts multiple virtual nodes to improve load balance.
- Replication: Store copies of data on multiple nodes for fault tolerance and read scalability.
- Multi-Ring Hashing: Use multiple hash rings for different data types or regions to isolate impact.
- Auto-Scaling: Dynamically add or remove nodes with minimal disruption.
- Monitoring & Load Balancing: Track node load and redistribute virtual nodes if imbalance occurs.
Assuming 1M keys and 100 nodes:
- Each node handles ~10,000 keys.
- Adding/removing a node remaps ~1% of keys (~10,000 keys).
- Requests per second (RPS): If total RPS is 100,000, each node handles ~1,000 RPS.
- Network bandwidth per node depends on key size and replication factor; e.g., 1 KB per key * 1,000 RPS = ~1 MB/s.
- Storage scales linearly with keys per node.
Start by explaining the problem of data distribution and node churn in distributed systems.
Describe how consistent hashing solves these by minimizing data movement.
Discuss virtual nodes and replication to improve balance and fault tolerance.
Use examples with increasing scale to show how the system adapts.
Highlight trade-offs and monitoring needs.
Your database handles 1000 QPS. Traffic grows 10x. What do you do first?
Answer: Add more nodes and use consistent hashing to redistribute keys with minimal data movement, ensuring no single node is overloaded.