What is Consistent Hashing: Explanation and Use Cases
hash function to assign both servers and data to points on a circle, minimizing disruption and improving scalability.How It Works
Imagine a circular ring like a clock face. Both servers and data items are placed on this ring using a hash function that converts their IDs into positions on the circle. Each data item is stored on the first server found by moving clockwise from its position.
This way, when a server joins or leaves, only the data items near that server on the ring need to move, not all data. This is like adding or removing a seat at a round table; only the neighbors next to that seat are affected.
This approach avoids the problem of reshuffling all data, which happens in simple hashing methods, making consistent hashing ideal for systems that change often.
Example
This Python example shows a simple consistent hashing ring with servers and data keys. It finds which server each key maps to.
import hashlib class ConsistentHashRing: def __init__(self, nodes=None): self.ring = dict() self.sorted_keys = [] if nodes: for node in nodes: self.add_node(node) def _hash(self, key): return int(hashlib.md5(key.encode()).hexdigest(), 16) def add_node(self, node): key = self._hash(node) self.ring[key] = node self.sorted_keys.append(key) self.sorted_keys.sort() def remove_node(self, node): key = self._hash(node) del self.ring[key] self.sorted_keys.remove(key) def get_node(self, key_str): key = self._hash(key_str) for node_key in self.sorted_keys: if key <= node_key: return self.ring[node_key] return self.ring[self.sorted_keys[0]] # Setup ring with 3 servers ring = ConsistentHashRing(['ServerA', 'ServerB', 'ServerC']) # Map some data keys keys = ['apple', 'banana', 'cherry', 'date', 'fig'] for key in keys: print(f"{key} -> {ring.get_node(key)}")
When to Use
Use consistent hashing when you have a distributed system with many servers that store data, and these servers can be added or removed often. It helps keep the system balanced and reduces the cost of moving data.
Common real-world uses include:
- Distributed caches like Memcached or Redis clusters
- Load balancing in web servers
- Distributed databases and storage systems
It is especially useful when you want to avoid downtime or heavy data reshuffling during scaling.
Key Points
- Consistent hashing maps servers and data to a circle using a hash function.
- Only data near a changed server moves when servers join or leave.
- It improves scalability and fault tolerance in distributed systems.
- Widely used in caching, load balancing, and distributed storage.