0
0
HLDsystem_design~10 mins

Consistent hashing in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Consistent hashing
Growth Table: Consistent Hashing at Different Scales
Users / RequestsNumber of NodesData DistributionNode Changes ImpactLoad Balancing
100 users3-5 nodesSimple hashing works, consistent hashing optionalMinimal impact, rehashing simpleBasic round-robin or hashing
10K users10-50 nodesConsistent hashing reduces data movement on node changesSmall portion of keys remappedVirtual nodes improve balance
1M users100-500 nodesData distribution critical, consistent hashing essentialNode failure causes minimal remapping (~1/n keys)Use virtual nodes and replication
100M users1000+ nodesHighly dynamic cluster, consistent hashing with replication and multi-ringEfficient scaling, minimal data movementAdvanced load balancing, monitoring, and auto-scaling
First Bottleneck

At small scale, the first bottleneck is uneven data distribution causing some nodes to be overloaded.

As the system grows, the bottleneck shifts to the cost of rehashing and data movement when nodes join or leave.

Without consistent hashing, large-scale systems suffer from massive data reshuffling, causing downtime and performance drops.

Scaling Solutions
  • Consistent Hashing: Minimizes data movement by mapping keys and nodes on a hash ring.
  • Virtual Nodes: Each physical node hosts multiple virtual nodes to improve load balance.
  • Replication: Store copies of data on multiple nodes for fault tolerance and read scalability.
  • Multi-Ring Hashing: Use multiple hash rings for different data types or regions to isolate impact.
  • Auto-Scaling: Dynamically add or remove nodes with minimal disruption.
  • Monitoring & Load Balancing: Track node load and redistribute virtual nodes if imbalance occurs.
Back-of-Envelope Cost Analysis

Assuming 1M keys and 100 nodes:

  • Each node handles ~10,000 keys.
  • Adding/removing a node remaps ~1% of keys (~10,000 keys).
  • Requests per second (RPS): If total RPS is 100,000, each node handles ~1,000 RPS.
  • Network bandwidth per node depends on key size and replication factor; e.g., 1 KB per key * 1,000 RPS = ~1 MB/s.
  • Storage scales linearly with keys per node.
Interview Tip

Start by explaining the problem of data distribution and node churn in distributed systems.

Describe how consistent hashing solves these by minimizing data movement.

Discuss virtual nodes and replication to improve balance and fault tolerance.

Use examples with increasing scale to show how the system adapts.

Highlight trade-offs and monitoring needs.

Self Check

Your database handles 1000 QPS. Traffic grows 10x. What do you do first?

Answer: Add more nodes and use consistent hashing to redistribute keys with minimal data movement, ensuring no single node is overloaded.

Key Result
Consistent hashing enables smooth scaling by minimizing data movement during node changes, making it ideal for large dynamic distributed systems.