0
0
Data-structures-theoryConceptBeginner · 3 min read

Double Hashing: What It Is and How It Works in Data Structures

Double hashing is a technique used in hash tables to resolve collisions by using two different hash functions. When a collision occurs, the second hash function calculates a step size to find the next available slot, reducing clustering and improving performance.
⚙️

How It Works

Imagine you have a row of mailboxes and you want to put letters in them based on their addresses. Sometimes, two letters might want to go into the same mailbox, causing a collision. Double hashing helps by using two different ways (hash functions) to decide where to put the letter.

The first hash function picks an initial mailbox. If it's taken, the second hash function tells you how many mailboxes to skip before checking the next one. This skipping pattern continues until an empty mailbox is found. This method spreads out the letters more evenly and avoids bunching up in one area.

💻

Example

This example shows how double hashing finds a place for keys in a hash table of size 7.

python
def double_hashing_insert(hash_table, key, size):
    def hash1(k):
        return k % size

    def hash2(k):
        return 1 + (k % (size - 1))

    index = hash1(key)
    step = hash2(key)
    i = 0

    while hash_table[(index + i * step) % size] is not None:
        i += 1

    hash_table[(index + i * step) % size] = key


# Create empty hash table
size = 7
hash_table = [None] * size

# Insert keys
keys = [10, 20, 5, 15]
for key in keys:
    double_hashing_insert(hash_table, key, size)

print(hash_table)
Output
[None, 15, None, 10, 20, None, 5]
🎯

When to Use

Double hashing is useful when you want to store data in a hash table and avoid long chains of collisions that slow down searching and inserting. It works well when the hash table is not too full and you want to reduce clustering, which happens when many keys end up close together.

Real-world uses include databases, caches, and any system that needs fast lookup and insertion of data without using extra memory for linked lists or trees.

Key Points

  • Double hashing uses two hash functions to handle collisions.
  • The second hash function determines the step size for probing.
  • It reduces clustering compared to simpler methods like linear probing.
  • It requires the hash table size to be chosen carefully, often a prime number.
  • It is efficient for open addressing hash tables with moderate load factors.

Key Takeaways

Double hashing resolves collisions by using two hash functions to find empty slots.
It reduces clustering and improves performance in hash tables with open addressing.
The second hash function provides a step size to probe different positions.
Choosing a prime number for the table size helps ensure all slots can be probed.
Double hashing is ideal for fast data lookup without extra memory overhead.