Overview - Why indexes speed up queries

What is it?

Indexes are special data structures in a database that help find information quickly. Instead of searching every row in a table, an index lets the database jump directly to the data you want. Think of it like a book's index that points you to the exact page instead of flipping through every page. This makes queries much faster, especially in large tables.

Why it matters

Without indexes, databases would have to look through every row to find matching data, which can be very slow and inefficient. This would make websites and apps feel sluggish, especially when dealing with lots of data. Indexes solve this by making data retrieval fast and smooth, improving user experience and saving computing resources.

Where it fits

Before learning about indexes, you should understand basic database concepts like tables, rows, and queries. After indexes, you can learn about query optimization, execution plans, and advanced indexing techniques like composite and full-text indexes.

Mental Model

Core Idea

An index is like a shortcut map that lets the database find data quickly without scanning every row.

Think of it like...

Imagine looking for a friend's phone number in a phone book. Instead of reading every name, you use the alphabetical index to jump straight to the right page. This saves time and effort, just like a database index speeds up data searches.

Table: Users
┌─────────────┬─────────────┐
│ ID (Index)  │ Name        │
├─────────────┼─────────────┤
│ 1           │ Alice       │
│ 2           │ Bob         │
│ 3           │ Charlie     │
└─────────────┴─────────────┘

Index Structure:
[ID]
 ├─ 1 → Row 1
 ├─ 2 → Row 2
 └─ 3 → Row 3

Query: Find user with ID=2
Without index: Check every row until ID=2 found
With index: Go directly to Row 2

Build-Up - 7 Steps

1

FoundationWhat is a database index

Concept: Introduce the basic idea of an index as a data structure that helps find rows faster.

A database index is like a list that stores key values and pointers to the actual data rows. Instead of scanning the whole table, the database uses the index to jump directly to the matching rows. This is similar to how an index in a book points to pages where a topic appears.

Result

Queries that use indexed columns run faster because the database avoids scanning all rows.

Understanding that an index is a separate structure that points to data rows is key to grasping how it speeds up queries.

2

FoundationHow queries work without indexes

3

IntermediateHow indexes organize data internally

4

IntermediateDifference between clustered and non-clustered indexes

5

IntermediateHow indexes speed up different query types

6

AdvancedTrade-offs and costs of using indexes

7

ExpertHow index statistics guide query optimization

Under the Hood

Indexes are stored as balanced tree structures (usually B-trees) that keep keys sorted and allow fast navigation. Each node contains keys and pointers to child nodes or data rows. When a query searches for a key, the database traverses the tree from root to leaf, choosing branches based on key comparisons. This reduces the search from scanning all rows to a few steps. Clustered indexes store data rows in the tree leaves, while non-clustered indexes store pointers to rows elsewhere. The database maintains these structures during data changes, updating nodes and balancing the tree as needed.

Why designed this way?

B-trees were chosen because they keep data balanced and sorted, minimizing disk reads by grouping keys in nodes. This design reduces the number of disk accesses, which are slow compared to memory operations. Alternatives like hash indexes are faster for exact matches but don't support range queries or sorting. The tradeoff favors B-trees for general-purpose indexing. Clustered indexes improve range query speed by storing data physically sorted, but limit to one per table. This design balances read speed, write overhead, and storage efficiency.

Index Search Flow
┌───────────┐
│ Root Node │
├───────────┤
│ Keys: 10, 20, 30 │
│ Pointers to child nodes │
└─────┬─────┘
      │
      ▼
┌───────────────┐
│ Child Node 1  │
│ Keys: 1, 5, 9 │
│ Pointers to data rows │
└───────────────┘

Search for key=5:
Start at Root Node → key 5 < 10 → go to Child Node 1 → find key 5 → get data row pointer

Myth Busters - 4 Common Misconceptions

Quick: Do indexes always make every query faster? Commit to yes or no.

Common Belief:Indexes always speed up all queries without any downside.

Tap to reveal reality

Quick: Do you think an index stores the entire table data? Commit to yes or no.

Common Belief:An index contains full copies of the table data for faster access.

Tap to reveal reality

Quick: Do you think the database always uses an index if it exists? Commit to yes or no.

Common Belief:If an index exists on a column, the database will always use it for queries on that column.

Tap to reveal reality

Quick: Do you think adding more indexes always improves performance? Commit to yes or no.

Common Belief:Adding many indexes always makes queries faster.

Tap to reveal reality

Expert Zone

1

Index selectivity, the uniqueness of keys, greatly affects index usefulness; low selectivity indexes may not improve query speed.

2

Covering indexes that include all columns needed by a query can eliminate the need to access the table data, speeding up queries further.

3

Fragmentation of indexes over time can degrade performance, requiring periodic maintenance like rebuilding or reorganizing indexes.

When NOT to use

Indexes are less useful for very small tables where full scans are fast, or for columns with very low uniqueness (e.g., boolean flags). In such cases, relying on full scans or other optimization techniques like caching may be better.

Production Patterns

In real systems, DBAs carefully choose indexes based on query patterns and monitor usage with tools. Composite indexes combine multiple columns for complex queries. Partial and filtered indexes optimize specific subsets of data. Index maintenance tasks like rebuilding are scheduled during low-traffic periods.

Connections

Hash Tables

Indexes and hash tables both provide fast key-based lookup but differ in structure and use cases.

Understanding hash tables helps grasp why some indexes use hashing for exact matches but prefer trees for range queries.

Library Catalog Systems

Indexes in databases are like library card catalogs that organize books by author or subject for quick retrieval.

Knowing how libraries organize information helps understand the purpose and design of database indexes.

File System Directories

Both indexes and file system directories map names to locations, enabling fast access to files or data rows.

Recognizing this similarity shows how indexing is a general solution for efficient data retrieval.

Common Pitfalls

#1Creating indexes on every column without considering query patterns.

Wrong approach:CREATE INDEX idx_all ON users(name, email, age, city);

Correct approach:CREATE INDEX idx_name_email ON users(name, email);

Root cause:Misunderstanding that indexes have maintenance costs and should target columns used in queries.

#2Expecting indexes to speed up queries that return most of the table.

Wrong approach:SELECT * FROM orders WHERE status IN ('shipped', 'delivered'); -- with index on status

Correct approach:Allow full table scan or redesign query; indexes less effective when many rows match.

Root cause:Not realizing that indexes help most when filtering to few rows, not large result sets.

#3Using indexes on columns with very low uniqueness like boolean flags.

Wrong approach:CREATE INDEX idx_active ON users(active); -- active is true/false

Correct approach:Avoid index on low-selectivity columns; rely on full scan or combine with other columns.

Root cause:Ignoring index selectivity reduces index effectiveness.

Key Takeaways

Indexes are special data structures that let databases find rows quickly without scanning the whole table.

They work like a book's index or phone book, pointing directly to the data you want.

Most indexes use balanced trees (B-trees) to keep keys sorted and enable fast searches.

While indexes speed up reads, they add overhead to writes and consume extra storage, so use them wisely.

The database optimizer decides when to use indexes based on data statistics, not always automatically.