Hash index for equality in PostgreSQL - Deep Dive
Start learning this pattern below
Jump into concepts and practice - no test required
Table: Data values and their hash codes ┌─────────────┬─────────────┐ │ Value │ Hash Code │ ├─────────────┼─────────────┤ │ 'apple' │ 12345 │ │ 'banana' │ 67890 │ │ 'cherry' │ 54321 │ └─────────────┴─────────────┘ Flow: [Query Value] → [Hash Function] → [Hash Code] → [Index Lookup] → [Row Found]
┌───────────────┐
│ Query Value │
└──────┬────────┘
│ Hash Function
▼
┌───────────────┐
│ Hash Code │
└──────┬────────┘
│ Index Lookup
▼
┌───────────────┐
│ Bucket in │
│ Hash Index │
│ (multiple │
│ entries if │
│ collisions) │
└──────┬────────┘
│ Compare actual values
▼
┌───────────────┐
│ Matching Rows │
└───────────────┘Practice
What is the main advantage of using a hash index in PostgreSQL?
Solution
Step 1: Understand hash index purpose
Hash indexes are designed to speed up searches where you look for exact matches (equality) on a column.Step 2: Compare with other index types
Unlike B-tree indexes, hash indexes do not support range queries or ordering, so they are not useful for those.Final Answer:
It speeds up equality searches on a column. -> Option AQuick Check:
Hash index = equality speedup [OK]
- Thinking hash indexes speed up range queries
- Confusing hash indexes with data compression
- Assuming hash indexes handle foreign keys automatically
Which of the following is the correct syntax to create a hash index on the email column of a table named users?
?
Solution
Step 1: Recall hash index syntax
The correct syntax uses CREATE INDEX, specifies the index name, the table, and uses USING hash to indicate a hash index.Step 2: Check each option
CREATE INDEX users_email_hash ON users USING hash (email); matches the correct syntax exactly. The other options have syntax errors or use the wrong index type.Final Answer:
CREATE INDEX users_email_hash ON users USING hash (email); -> Option DQuick Check:
CREATE INDEX ... USING hash ... [OK]
- Using CREATE HASH INDEX instead of CREATE INDEX
- Forgetting 'USING hash' clause
- Using btree instead of hash for hash index
Given the table products(id INT, name TEXT) with a hash index on id, what will the query SELECT * FROM products WHERE id = 10; most likely use?
Solution
Step 1: Identify query condition type
The query uses an equality condition on theidcolumn:id = 10.Step 2: Match index type to query
Since there is a hash index onid, PostgreSQL will use a hash index scan to quickly find rows whereidequals 10.Final Answer:
A hash index scan for fast equality lookup -> Option BQuick Check:
Equality query + hash index = hash index scan [OK]
- Assuming sequential scan always happens
- Confusing bitmap index with hash index
- Thinking hash index supports range queries
Consider the following SQL commands:CREATE TABLE employees(id INT, name TEXT);
CREATE INDEX emp_id_hash ON employees USING hash (id);
SELECT * FROM employees WHERE id > 5;
What is the problem with using the hash index in this query?
Solution
Step 1: Understand hash index limitations
Hash indexes only support equality searches, not range conditions likeid > 5.Step 2: Analyze the query condition
The query uses a range condition, so the hash index cannot be used efficiently here.Final Answer:
Hash indexes do not support range queries likeid > 5. -> Option CQuick Check:
Range query + hash index = no use [OK]
- Thinking hash indexes support range queries
- Believing index names must follow special rules
- Assuming primary key is required for hash index
You have a large table orders(order_id INT, customer_id INT, status TEXT). You often query orders by customer_id with equality conditions, but sometimes you query by status with range-like conditions (e.g., status > 'A'). Which indexing strategy is best?
Solution
Step 1: Match index types to query patterns
Hash indexes are good for equality searches, so use one oncustomer_id. B-tree indexes support range queries, so use one onstatus.Step 2: Evaluate options
Create a hash index oncustomer_idand a B-tree index onstatus. correctly assigns hash index for equality and B-tree for range. Create hash indexes on bothcustomer_idandstatus. wrongly uses hash for range. Create a B-tree index oncustomer_idonly. misses index onstatus. Create no indexes to avoid overhead. ignores performance.Final Answer:
Create a hash index oncustomer_idand a B-tree index onstatus. -> Option AQuick Check:
Equality = hash, range = B-tree [OK]
- Using hash index for range queries
- Not indexing frequently queried columns
- Avoiding indexes due to overhead without reason
