Load factor and rehashing in Data Structures Theory - Time & Space Complexity
When using hash tables, it is important to understand how the load factor and rehashing affect performance.
We want to know how the time to insert or find items changes as the table fills up and grows.
Analyze the time complexity of this simplified hash table insertion with rehashing.
function insert(key, value) {
if (loadFactor() > 0.7) {
rehash();
}
index = hash(key) % tableSize;
table[index] = {key: key, value: value};
numberOfItems++;
}
function loadFactor() {
return numberOfItems / tableSize;
}
function rehash() {
oldTable = table;
tableSize = tableSize * 2;
table = new Array(tableSize);
numberOfItems = 0;
for (let item of oldTable) {
if (item !== undefined) {
insert(item.key, item.value);
}
}
}
This code inserts items into a hash table, and when the table is more than 70% full, it doubles the size and re-inserts all items.
Look at what repeats as the input grows.
- Primary operation: Inserting items into the hash table.
- How many times: Each insert is usually quick, but when rehashing happens, all existing items are re-inserted once.
Normally, inserting one item takes about the same time regardless of size.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 simple inserts, occasional rehashing |
| 100 | About 100 inserts, with a few rehashes doubling the table |
| 1000 | About 1000 inserts, with several rehashes each re-inserting many items |
As the table grows, rehashing happens less often but costs more each time because it re-inserts all items.
Time Complexity: O(n)
This means that inserting n items takes time proportional to n, even with rehashing included.
[X] Wrong: "Rehashing makes insertion very slow every time."
[OK] Correct: Rehashing happens only occasionally, so its cost spreads out, keeping average insertion fast.
Understanding load factor and rehashing helps you explain how hash tables stay efficient as they grow, a key skill in many coding discussions.
"What if the load factor threshold was set to 0.9 instead of 0.7? How would that affect the time complexity?"