0
0
MySQLquery~15 mins

Why indexes speed up queries in MySQL - Why It Works This Way

Choose your learning style9 modes available
Overview - Why indexes speed up queries
What is it?
Indexes are special data structures in a database that help find information quickly. Instead of searching every row in a table, an index lets the database jump directly to the data you want. Think of it like a book's index that points you to the exact page instead of flipping through every page. This makes queries much faster, especially in large tables.
Why it matters
Without indexes, databases would have to look through every row to find matching data, which can be very slow and inefficient. This would make websites and apps feel sluggish, especially when dealing with lots of data. Indexes solve this by making data retrieval fast and smooth, improving user experience and saving computing resources.
Where it fits
Before learning about indexes, you should understand basic database concepts like tables, rows, and queries. After indexes, you can learn about query optimization, execution plans, and advanced indexing techniques like composite and full-text indexes.
Mental Model
Core Idea
An index is like a shortcut map that lets the database find data quickly without scanning every row.
Think of it like...
Imagine looking for a friend's phone number in a phone book. Instead of reading every name, you use the alphabetical index to jump straight to the right page. This saves time and effort, just like a database index speeds up data searches.
Table: Users
┌─────────────┬─────────────┐
│ ID (Index)  │ Name        │
├─────────────┼─────────────┤
│ 1           │ Alice       │
│ 2           │ Bob         │
│ 3           │ Charlie     │
└─────────────┴─────────────┘

Index Structure:
[ID]
 ├─ 1 → Row 1
 ├─ 2 → Row 2
 └─ 3 → Row 3

Query: Find user with ID=2
Without index: Check every row until ID=2 found
With index: Go directly to Row 2
Build-Up - 7 Steps
1
FoundationWhat is a database index
🤔
Concept: Introduce the basic idea of an index as a data structure that helps find rows faster.
A database index is like a list that stores key values and pointers to the actual data rows. Instead of scanning the whole table, the database uses the index to jump directly to the matching rows. This is similar to how an index in a book points to pages where a topic appears.
Result
Queries that use indexed columns run faster because the database avoids scanning all rows.
Understanding that an index is a separate structure that points to data rows is key to grasping how it speeds up queries.
2
FoundationHow queries work without indexes
🤔
Concept: Explain the default behavior of scanning all rows when no index exists.
When you run a query without an index on the searched column, the database checks every row one by one to find matches. This is called a full table scan. For small tables, this is okay, but for large tables, it becomes very slow.
Result
Query time grows linearly with the number of rows, making searches slow on big tables.
Knowing that full table scans check every row helps you see why indexes are needed for speed.
3
IntermediateHow indexes organize data internally
🤔Before reading on: do you think indexes store full rows or just keys and pointers? Commit to your answer.
Concept: Introduce the internal structure of indexes, focusing on B-trees as the common method.
Most database indexes use a B-tree structure. This is a balanced tree where each node contains keys and pointers to child nodes or data rows. The tree is sorted, so the database can quickly decide which branch to follow to find a key. This reduces search steps from scanning all rows to just a few steps down the tree.
Result
Searching an index takes logarithmic time, much faster than scanning all rows.
Understanding the B-tree structure explains why indexes speed up searches so dramatically.
4
IntermediateDifference between clustered and non-clustered indexes
🤔Before reading on: do you think clustered indexes store data differently than non-clustered? Commit to your answer.
Concept: Explain the two main types of indexes and how they affect data storage and retrieval.
A clustered index sorts and stores the actual table data rows in the order of the index key. There can be only one clustered index per table. A non-clustered index stores keys and pointers to the data rows, which remain in their original order. Non-clustered indexes can be many per table.
Result
Clustered indexes speed up range queries and sorting, while non-clustered indexes speed up lookups on specific columns.
Knowing the difference helps you choose the right index type for your queries.
5
IntermediateHow indexes speed up different query types
🤔Before reading on: do you think indexes help only with exact matches or also with range queries? Commit to your answer.
Concept: Show how indexes improve performance for exact matches, range queries, and sorting.
Indexes help exact match queries by quickly locating the key. For range queries (e.g., values between X and Y), the B-tree allows scanning only the relevant part of the tree. Indexes also help ORDER BY clauses by providing data already sorted by the indexed column.
Result
Queries with WHERE, BETWEEN, and ORDER BY on indexed columns run much faster.
Understanding the variety of queries helped by indexes shows their broad usefulness.
6
AdvancedTrade-offs and costs of using indexes
🤔Before reading on: do you think adding more indexes always improves performance? Commit to your answer.
Concept: Explain that indexes speed up reads but add overhead to writes and storage.
While indexes make SELECT queries faster, they slow down INSERT, UPDATE, and DELETE operations because the index must be updated too. Indexes also consume extra disk space. Therefore, adding too many indexes can hurt overall performance.
Result
Balanced indexing improves read speed without excessive write slowdown or storage use.
Knowing the costs of indexes helps you design efficient databases that balance speed and resource use.
7
ExpertHow index statistics guide query optimization
🤔Before reading on: do you think the database always uses indexes if they exist? Commit to your answer.
Concept: Reveal how the database uses index statistics to decide whether to use an index or not.
Databases collect statistics about index data distribution and table size. The query optimizer uses these stats to estimate if using an index is faster than a full scan. Sometimes, for very small tables or queries returning many rows, the optimizer skips the index. Understanding this helps explain why indexes are not always used.
Result
Query plans adapt dynamically, choosing the fastest method based on data and query.
Understanding index statistics and optimizer decisions reveals the complexity behind query speed.
Under the Hood
Indexes are stored as balanced tree structures (usually B-trees) that keep keys sorted and allow fast navigation. Each node contains keys and pointers to child nodes or data rows. When a query searches for a key, the database traverses the tree from root to leaf, choosing branches based on key comparisons. This reduces the search from scanning all rows to a few steps. Clustered indexes store data rows in the tree leaves, while non-clustered indexes store pointers to rows elsewhere. The database maintains these structures during data changes, updating nodes and balancing the tree as needed.
Why designed this way?
B-trees were chosen because they keep data balanced and sorted, minimizing disk reads by grouping keys in nodes. This design reduces the number of disk accesses, which are slow compared to memory operations. Alternatives like hash indexes are faster for exact matches but don't support range queries or sorting. The tradeoff favors B-trees for general-purpose indexing. Clustered indexes improve range query speed by storing data physically sorted, but limit to one per table. This design balances read speed, write overhead, and storage efficiency.
Index Search Flow
┌───────────┐
│ Root Node │
├───────────┤
│ Keys: 10, 20, 30 │
│ Pointers to child nodes │
└─────┬─────┘
      │
      ▼
┌───────────────┐
│ Child Node 1  │
│ Keys: 1, 5, 9 │
│ Pointers to data rows │
└───────────────┘

Search for key=5:
Start at Root Node → key 5 < 10 → go to Child Node 1 → find key 5 → get data row pointer
Myth Busters - 4 Common Misconceptions
Quick: Do indexes always make every query faster? Commit to yes or no.
Common Belief:Indexes always speed up all queries without any downside.
Tap to reveal reality
Reality:Indexes speed up read queries but slow down write operations because they need updating. Also, for small tables or queries returning many rows, indexes might not help.
Why it matters:Ignoring write overhead can cause slow data inserts and updates, hurting overall application performance.
Quick: Do you think an index stores the entire table data? Commit to yes or no.
Common Belief:An index contains full copies of the table data for faster access.
Tap to reveal reality
Reality:Indexes store only keys and pointers to data rows, not full data copies. This keeps indexes smaller and faster to search.
Why it matters:Misunderstanding this can lead to expecting indexes to save storage or replace tables, which they do not.
Quick: Do you think the database always uses an index if it exists? Commit to yes or no.
Common Belief:If an index exists on a column, the database will always use it for queries on that column.
Tap to reveal reality
Reality:The query optimizer decides whether to use an index based on statistics. Sometimes it chooses a full scan if that is faster.
Why it matters:Assuming indexes are always used can confuse troubleshooting when queries run slowly despite indexes.
Quick: Do you think adding more indexes always improves performance? Commit to yes or no.
Common Belief:Adding many indexes always makes queries faster.
Tap to reveal reality
Reality:Too many indexes slow down data modifications and increase storage use, hurting overall performance.
Why it matters:Over-indexing can degrade system performance and increase maintenance complexity.
Expert Zone
1
Index selectivity, the uniqueness of keys, greatly affects index usefulness; low selectivity indexes may not improve query speed.
2
Covering indexes that include all columns needed by a query can eliminate the need to access the table data, speeding up queries further.
3
Fragmentation of indexes over time can degrade performance, requiring periodic maintenance like rebuilding or reorganizing indexes.
When NOT to use
Indexes are less useful for very small tables where full scans are fast, or for columns with very low uniqueness (e.g., boolean flags). In such cases, relying on full scans or other optimization techniques like caching may be better.
Production Patterns
In real systems, DBAs carefully choose indexes based on query patterns and monitor usage with tools. Composite indexes combine multiple columns for complex queries. Partial and filtered indexes optimize specific subsets of data. Index maintenance tasks like rebuilding are scheduled during low-traffic periods.
Connections
Hash Tables
Indexes and hash tables both provide fast key-based lookup but differ in structure and use cases.
Understanding hash tables helps grasp why some indexes use hashing for exact matches but prefer trees for range queries.
Library Catalog Systems
Indexes in databases are like library card catalogs that organize books by author or subject for quick retrieval.
Knowing how libraries organize information helps understand the purpose and design of database indexes.
File System Directories
Both indexes and file system directories map names to locations, enabling fast access to files or data rows.
Recognizing this similarity shows how indexing is a general solution for efficient data retrieval.
Common Pitfalls
#1Creating indexes on every column without considering query patterns.
Wrong approach:CREATE INDEX idx_all ON users(name, email, age, city);
Correct approach:CREATE INDEX idx_name_email ON users(name, email);
Root cause:Misunderstanding that indexes have maintenance costs and should target columns used in queries.
#2Expecting indexes to speed up queries that return most of the table.
Wrong approach:SELECT * FROM orders WHERE status IN ('shipped', 'delivered'); -- with index on status
Correct approach:Allow full table scan or redesign query; indexes less effective when many rows match.
Root cause:Not realizing that indexes help most when filtering to few rows, not large result sets.
#3Using indexes on columns with very low uniqueness like boolean flags.
Wrong approach:CREATE INDEX idx_active ON users(active); -- active is true/false
Correct approach:Avoid index on low-selectivity columns; rely on full scan or combine with other columns.
Root cause:Ignoring index selectivity reduces index effectiveness.
Key Takeaways
Indexes are special data structures that let databases find rows quickly without scanning the whole table.
They work like a book's index or phone book, pointing directly to the data you want.
Most indexes use balanced trees (B-trees) to keep keys sorted and enable fast searches.
While indexes speed up reads, they add overhead to writes and consume extra storage, so use them wisely.
The database optimizer decides when to use indexes based on data statistics, not always automatically.