Double Hashing: What It Is and How It Works in Data Structures
hash tables to resolve collisions by using two different hash functions. When a collision occurs, the second hash function calculates a step size to find the next available slot, reducing clustering and improving performance.How It Works
Imagine you have a row of mailboxes and you want to put letters in them based on their addresses. Sometimes, two letters might want to go into the same mailbox, causing a collision. Double hashing helps by using two different ways (hash functions) to decide where to put the letter.
The first hash function picks an initial mailbox. If it's taken, the second hash function tells you how many mailboxes to skip before checking the next one. This skipping pattern continues until an empty mailbox is found. This method spreads out the letters more evenly and avoids bunching up in one area.
Example
This example shows how double hashing finds a place for keys in a hash table of size 7.
def double_hashing_insert(hash_table, key, size): def hash1(k): return k % size def hash2(k): return 1 + (k % (size - 1)) index = hash1(key) step = hash2(key) i = 0 while hash_table[(index + i * step) % size] is not None: i += 1 hash_table[(index + i * step) % size] = key # Create empty hash table size = 7 hash_table = [None] * size # Insert keys keys = [10, 20, 5, 15] for key in keys: double_hashing_insert(hash_table, key, size) print(hash_table)
When to Use
Double hashing is useful when you want to store data in a hash table and avoid long chains of collisions that slow down searching and inserting. It works well when the hash table is not too full and you want to reduce clustering, which happens when many keys end up close together.
Real-world uses include databases, caches, and any system that needs fast lookup and insertion of data without using extra memory for linked lists or trees.
Key Points
- Double hashing uses two hash functions to handle collisions.
- The second hash function determines the step size for probing.
- It reduces clustering compared to simpler methods like linear probing.
- It requires the hash table size to be chosen carefully, often a prime number.
- It is efficient for open addressing hash tables with moderate load factors.