Overview - HashMap Implementation from Scratch

What is it?

A HashMap is a data structure that stores key-value pairs. It allows fast access, insertion, and deletion of values based on their keys. Internally, it uses a function called a hash function to convert keys into indexes in an array. This way, it can quickly find where to store or look up data.

Why it matters

Without HashMaps, programs would need to search through lists or arrays one by one to find data, which is slow. HashMaps make data retrieval almost instant, even with large amounts of data. This speed is crucial for many applications like databases, caches, and real-time systems.

Where it fits

Before learning HashMaps, you should understand arrays and basic data structures like lists. After mastering HashMaps, you can explore more complex structures like Trees, Graphs, and advanced hashing techniques.

Mental Model

Core Idea

A HashMap uses a hash function to turn keys into array positions, storing values so they can be found quickly without searching the whole collection.

Think of it like...

Imagine a library where each book has a unique code. Instead of searching every shelf, you use the code to go directly to the right shelf and spot. The hash function is like the code translator, and the array is the shelves.

HashMap Structure:

[Array]
  ├─ Index 0: Bucket (Linked List or Tree)
  ├─ Index 1: Bucket
  ├─ Index 2: Bucket
  ├─ ...
  └─ Index N: Bucket

Key --hash_function--> Index

Each bucket holds entries with keys that hash to the same index.

Build-Up - 7 Steps

1

FoundationUnderstanding Key-Value Storage

Concept: Learn what key-value pairs are and why they are useful.

A key-value pair stores two linked pieces of data: a key (like a name) and a value (like a phone number). For example, in a phonebook, the name is the key and the phone number is the value. This lets you find the phone number by knowing the name.

Result

You understand that data can be organized by unique keys to quickly find related values.

Knowing key-value pairs is the base for all map-like data structures, including HashMaps.

2

FoundationArrays as Storage Backbones

3

IntermediateHash Functions: Keys to Indexes

4

IntermediateHandling Collisions with Buckets

5

IntermediateBasic HashMap Operations

6

AdvancedResizing and Rehashing the Map

7

ExpertOptimizing Collisions with Trees

Under the Hood

A HashMap stores data in an array where each position is a bucket. The hash function converts keys into integers, which are then mapped to array indexes using modulo. When collisions occur, entries are stored in a linked list or tree at that index. On insertion, retrieval, or deletion, the hash function directs to the bucket, and the bucket is searched linearly or via tree traversal. When the load factor (items/array size) exceeds a threshold, the array is resized and all entries are rehashed to new positions.

Why designed this way?

HashMaps were designed to provide average constant-time operations by combining arrays and hashing. Arrays offer fast index access, but keys are not numeric indexes, so hashing maps keys to indexes. Collisions are inevitable due to limited array size, so buckets handle them. Resizing balances memory use and speed. Alternatives like balanced trees alone are slower on average, so HashMaps blend speed and flexibility.

HashMap Internal Flow:

[Key] --hash_function--> [Hash Code] --mod array_size--> [Index]

Array:
┌─────────────┐
│ Index 0     │──> Bucket (Linked List / Tree)
│ Index 1     │──> Bucket
│ Index 2     │──> Bucket
│ ...         │
│ Index N     │──> Bucket
└─────────────┘

Operations:
Insert/Get/Delete
  ↓
Compute index
  ↓
Access bucket
  ↓
Search bucket for key
  ↓
Perform operation

Myth Busters - 4 Common Misconceptions

Quick: does a hash function guarantee unique indexes for different keys? Commit to yes or no.

Common Belief:A hash function always gives a unique index for each key, so collisions never happen.

Tap to reveal reality

Quick: do you think resizing a HashMap happens every time you add a new item? Commit to yes or no.

Common Belief:The HashMap resizes its array every time a new key-value pair is added.

Tap to reveal reality

Quick: do you think linked lists are always the best way to handle collisions? Commit to yes or no.

Common Belief:Linked lists are the only way to handle collisions in HashMaps.

Tap to reveal reality

Quick: do you think the order of items in a HashMap is always the same as insertion order? Commit to yes or no.

Common Belief:HashMaps keep items in the order they were added.

Tap to reveal reality

Expert Zone

1

The quality of the hash function greatly affects performance; poor hash functions cause many collisions and degrade speed.

2

Load factor tuning balances memory use and speed; a lower load factor means faster lookups but more memory.

3

Some implementations use open addressing (probing) instead of buckets, trading off memory and complexity.

When NOT to use

HashMaps are not ideal when order matters; use LinkedHashMap or Trees instead. For small datasets or when memory is very limited, simpler arrays or lists may be better. When keys are complex objects without good hash functions, consider other structures like balanced trees.

Production Patterns

In production, HashMaps are used for caches, symbol tables, and fast lookups. They often combine with concurrency controls for thread safety. Implementations optimize resizing, use high-quality hash functions, and switch bucket structures dynamically to maintain performance.

Connections

Balanced Trees

Alternative collision handling method

Understanding balanced trees helps grasp how some HashMaps optimize buckets for worst-case speed.

Cryptographic Hash Functions

Advanced hash function design

Knowing cryptographic hashes shows how hash functions can be designed for uniform distribution and security.

Cache Memory in Computer Architecture

Similar indexing and collision handling

Cache memory uses similar hashing and collision strategies to quickly find data, linking hardware and software concepts.

Common Pitfalls

#1Ignoring collisions and overwriting existing entries.

Wrong approach:void put(HashMap* map, Key key, Value value) { int index = hash(key) % map->size; map->array[index] = value; // overwrites without checking }

Correct approach:void put(HashMap* map, Key key, Value value) { int index = hash(key) % map->size; Bucket* bucket = map->array[index]; // Search bucket for key // If found, update value // Else, add new entry to bucket }

Root cause:Misunderstanding that multiple keys can map to the same index and need separate storage.

#2Not resizing the HashMap leading to slow performance.

Wrong approach:void put(HashMap* map, Key key, Value value) { // No check for load factor or resizing // Insert directly }

Correct approach:void put(HashMap* map, Key key, Value value) { if ((float)map->count / map->size > LOAD_FACTOR_THRESHOLD) { resize(map); } // Insert after resizing }

Root cause:Ignoring the need to maintain a low load factor for performance.

#3Assuming iteration order matches insertion order.

Wrong approach:for (int i = 0; i < map->size; i++) { // Print entries assuming insertion order }

Correct approach:for (int i = 0; i < map->size; i++) { // Print entries in bucket order, no order guarantee }

Root cause:Misunderstanding that HashMaps do not preserve order.

Key Takeaways

HashMaps store key-value pairs using a hash function to quickly find data without searching all items.

Collisions happen when different keys map to the same index; buckets like linked lists or trees handle them.

Resizing the underlying array and rehashing entries keeps HashMaps fast as they grow.

Advanced HashMaps optimize collision handling by switching bucket structures to maintain speed.

Understanding HashMaps is essential for efficient data storage and retrieval in many software systems.