Overview - How an index works (B-tree mental model)

What is it?

An index in a database is a special data structure that helps find data quickly without scanning every row. It works like a sorted map that points to where data is stored. The B-tree is a common way to organize this index so searches, inserts, and deletes stay fast even with lots of data. This makes databases respond quickly to queries.

Why it matters

Without indexes, databases would have to look at every row to find what you want, which is very slow for big data. Indexes solve this by letting the database jump directly to the right place. This speeds up searches, sorting, and filtering, making apps and websites faster and more responsive. Without indexes, many modern data-driven services would be too slow to use.

Where it fits

Before learning about indexes, you should understand basic database tables and how queries work. After this, you can learn about query optimization and advanced indexing types like hash indexes or full-text indexes. Indexes fit into the bigger picture of making databases efficient and scalable.

Mental Model

Core Idea

A B-tree index organizes data like a balanced tree that guides the database to the exact location of data quickly by narrowing down choices step-by-step.

Think of it like...

Imagine a phone book organized by last names. Instead of reading every name, you open to a section by letter, then narrow down to the exact name by flipping through smaller sections. The B-tree index works like this phone book, guiding you quickly to the right page.

Root
 ├─ Internal Node 1
 │    ├─ Leaf Node A (data pointers)
 │    └─ Leaf Node B (data pointers)
 └─ Internal Node 2
      ├─ Leaf Node C (data pointers)
      └─ Leaf Node D (data pointers)

Each node contains keys that split the data range, leading down to leaves that point to actual data rows.

Build-Up - 7 Steps

1

FoundationWhat is a database index

Concept: Introduce the basic idea of an index as a tool to speed up data lookup.

A database index is like a shortcut or a table of contents for your data. Instead of searching every row, the database uses the index to find data faster. Think of it as a list that tells where to find things quickly.

Result

Queries that use indexed columns run faster because the database looks up data directly.

Understanding that indexes are shortcuts helps you see why they are essential for performance.

2

FoundationBasics of B-tree structure

3

IntermediateHow B-tree guides data search

4

IntermediateInserting and balancing B-trees

5

IntermediateDeleting keys from B-trees

6

AdvancedWhy B-trees fit disk storage well

7

ExpertSurprising B-tree internals and tradeoffs

Under the Hood

A B-tree index stores keys in nodes arranged in a balanced tree. Each node holds multiple keys and pointers to child nodes or data rows. Searching starts at the root, comparing keys to decide which child to follow, repeating until reaching a leaf node that points to actual data. Inserts and deletes adjust nodes by splitting or merging to keep the tree balanced and shallow. Nodes are sized to match disk pages, minimizing disk reads during traversal.

Why designed this way?

B-trees were designed to optimize disk-based storage where reading large blocks is faster than many small reads. Balancing keeps search times predictable. Alternatives like binary trees are unbalanced and inefficient on disk. Hash indexes are fast for exact matches but don't support range queries well. B-trees offer a good balance for many query types and data sizes.

┌─────────────┐
│    Root     │
│ [10, 20, 30] │
├─────┬───────┤
│     │       │
▼     ▼       ▼
Node1 Node2  Node3
[1-9] [11-19] [21-29]
  │      │       │
Leafs  Leafs   Leafs
  │      │       │
Data   Data    Data

Search compares key to root keys, chooses branch, repeats until leaf.

Myth Busters - 4 Common Misconceptions

Quick: Do you think an index always speeds up every query? Commit yes or no.

Common Belief:Indexes always make queries faster.

Tap to reveal reality

Quick: Do you think B-tree indexes store data inside the tree nodes themselves? Commit yes or no.

Common Belief:B-tree indexes store the actual data inside their nodes.

Tap to reveal reality

Quick: Do you think the B-tree height grows linearly with data size? Commit yes or no.

Common Belief:As data grows, B-tree height grows proportionally, making searches slower.

Tap to reveal reality

Quick: Do you think all database indexes use B-trees? Commit yes or no.

Common Belief:All database indexes are B-trees.

Tap to reveal reality

Expert Zone

1

B-tree node size tuning can drastically affect performance depending on hardware cache sizes and disk page sizes.

2

Partial indexes and covering indexes can reduce I/O by storing only relevant keys or including extra columns in the index.

3

Concurrency control in B-trees involves complex locking or latch-free algorithms to allow many users to read and write safely.

When NOT to use

B-tree indexes are less effective for very high write workloads or when queries only need exact matches without range scans; in such cases, hash indexes or in-memory structures may be better.

Production Patterns

In production, B-tree indexes are combined with query planners that decide when to use them. DBAs monitor index usage and rebuild or reorganize indexes to maintain performance. Composite indexes on multiple columns optimize common query patterns.

Connections

Binary Search Algorithm

B-tree search is a generalization of binary search to multiple keys per node.

Understanding binary search helps grasp how B-trees narrow down search ranges efficiently.

File System Directory Trees

Both organize data hierarchically to allow fast lookup and management.

Knowing how file systems use trees clarifies why balanced trees are good for organizing large data sets.

Decision Trees in Machine Learning

Both use tree structures to split data based on keys or features to reach a conclusion quickly.

Seeing the similarity helps understand how hierarchical decisions speed up complex tasks.

Common Pitfalls

#1Creating indexes on columns that are frequently updated.

Wrong approach:CREATE INDEX idx_name ON table_name(frequently_updated_column);

Correct approach:Avoid indexing columns that change often or use partial indexes if possible.

Root cause:Misunderstanding that indexes add overhead on writes, slowing down updates.

#2Assuming indexes speed up all queries regardless of conditions.

Wrong approach:SELECT * FROM table WHERE non_indexed_column = 'value'; -- expecting fast search

Correct approach:Create indexes on columns used in WHERE clauses or joins to improve performance.

Root cause:Not knowing that indexes only help when queries filter or sort on indexed columns.

#3Using too many indexes on a single table.

Wrong approach:CREATE INDEX idx1 ON table(col1); CREATE INDEX idx2 ON table(col2); CREATE INDEX idx3 ON table(col3);

Correct approach:Create only necessary indexes based on query patterns to balance read and write performance.

Root cause:Believing more indexes always mean faster queries without considering maintenance cost.

Key Takeaways

Indexes are special data structures that speed up data lookup by avoiding full table scans.

B-tree indexes organize keys in a balanced tree that narrows search step-by-step, keeping queries fast even with large data.

The design of B-trees matches disk storage patterns to minimize slow disk reads during searches.

Indexes improve read speed but add overhead to writes, so they must be used thoughtfully.

Understanding B-tree internals helps optimize database performance and troubleshoot indexing issues.