Introduction
Finding data quickly in a large database can be like searching for a needle in a haystack. Hash indexes solve this problem by organizing data so that the database can jump directly to the needed information without scanning everything.
Jump into concepts and practice - no test required
Imagine a large library where books are stored in numbered lockers. Each book's title is converted into a locker number using a special formula. When you want a book, you use the formula to find the locker directly instead of searching every shelf.
┌───────────────┐
│ Search Key │
└──────┬────────┘
│ Hash Function
▼
┌───────────────┐
│ Hash Value │
└──────┬────────┘
│ Points to
▼
┌───────────────┐
│ Bucket │
│ ┌───────────┐ │
│ │ Record 1 │ │
│ │ Record 2 │ │
│ │ ... │ │
│ └───────────┘ │
└───────────────┘class HashIndex: def __init__(self, size=10): self.size = size self.buckets = [[] for _ in range(size)] def hash_function(self, key): return hash(key) % self.size def insert(self, key, value): index = self.hash_function(key) # Check if key exists and update for i, (k, v) in enumerate(self.buckets[index]): if k == key: self.buckets[index][i] = (key, value) return # Otherwise, add new self.buckets[index].append((key, value)) def search(self, key): index = self.hash_function(key) for k, v in self.buckets[index]: if k == key: return v return None # Example usage index = HashIndex() index.insert('apple', 'A fruit') index.insert('car', 'A vehicle') print(index.search('apple')) print(index.search('car')) print(index.search('banana'))
hash index in a database?user_id in SQL (assuming the database supports hash indexes)?CREATE INDEX with USING HASH to specify the index type.USING HASH after the index name and before the column list.email column:SELECT * FROM users WHERE email = 'alice@example.com';
phone_number column but notices that queries with LIKE '%1234' are slow. What is the most likely reason?LIKE '%1234' pattern searches for suffix matches, which hash indexes cannot optimize.customer_id column, but also need to efficiently query ranges of order_date. Which indexing strategy is best?customer_id.order_date.customer_id and a B-tree index on order_date -> Option A