Overview - Index-only scans mental model

What is it?

An index-only scan is a way PostgreSQL can answer a query using just the data stored in an index, without reading the full table rows. It happens when all the requested columns are available in the index itself. This makes the query faster because it avoids extra work reading the main table data. It is a special optimization that saves time and resources.

Why it matters

Without index-only scans, PostgreSQL must read both the index and the table data to answer queries, which takes more time and slows down applications. Index-only scans reduce disk reads and speed up queries, especially on large tables. This means users get faster responses and servers handle more work efficiently.

Where it fits

Before learning index-only scans, you should understand basic indexing and how PostgreSQL uses indexes to find data. After this, you can explore query planning and execution details, and advanced indexing techniques like covering indexes and bitmap index scans.

Mental Model

Core Idea

An index-only scan answers a query using just the index data, skipping the main table to save time.

Think of it like...

Imagine you want to find a book's title and author in a library. Normally, you find the book's shelf location in the catalog (index) and then walk to the shelf to read the book (table). An index-only scan is like having a catalog that already lists the title and author, so you don't need to go to the shelf at all.

┌───────────────┐       ┌───────────────┐
│   Query       │       │   Index       │
│ (columns)     │──────▶│ (contains all │
│               │       │  needed data) │
└───────────────┘       └───────────────┘
                             │
                             │
                             ▼
                      (No table access)
                             │
                             ▼
                      ┌───────────────┐
                      │  Result set   │
                      └───────────────┘

Build-Up - 7 Steps

1

FoundationWhat is an index in PostgreSQL

Concept: Introduce the idea of an index as a data structure that helps find rows quickly.

In PostgreSQL, an index is like a sorted list or map that helps the database find rows without scanning the whole table. It stores key column values and pointers to the full rows. For example, an index on a 'name' column lets PostgreSQL quickly find all rows with a certain name.

Result

Queries using indexed columns run faster because PostgreSQL can jump directly to matching rows.

Understanding indexes is essential because index-only scans build on the idea that indexes hold useful data to speed up queries.

2

FoundationHow PostgreSQL normally uses indexes

3

IntermediateWhat is an index-only scan

4

IntermediateWhen can index-only scans be used

5

IntermediateHow PostgreSQL tracks visibility for index-only scans

6

AdvancedPerformance benefits and trade-offs of index-only scans

7

ExpertSurprises and internals of index-only scans

Under the Hood

PostgreSQL stores indexes as separate data structures containing key columns and pointers to table rows. For index-only scans, the index must include all columns requested by the query. PostgreSQL checks the visibility map to confirm that the table pages referenced have only visible tuples. If so, it reads data directly from the index without accessing the heap (table). This avoids random disk I/O to the table, reducing latency. The visibility map is updated by vacuum processes that mark pages as all-visible when no tuples are deleted or updated.

Why designed this way?

Index-only scans were introduced to optimize read performance by reducing expensive table access. The design leverages existing index structures and visibility maps to safely skip heap reads. Alternatives like always reading the table after the index were slower. The visibility map balances correctness with performance by tracking page visibility without scanning the entire table each time. This design fits PostgreSQL's MVCC model and vacuuming approach.

┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│   Query       │──────▶│   Index       │──────▶│ Visibility    │
│ (columns)     │       │ (all needed   │       │ Map Check     │
└───────────────┘       │ columns + TIDs│       └───────────────┘
                        └───────────────┘               │
                                                      Yes│
                                                       ▼
                                               ┌───────────────┐
                                               │ Return data   │
                                               │ from index    │
                                               └───────────────┘
                                                      ▲
                                                      │No
                                                      ▼
                                               ┌───────────────┐
                                               │ Read table    │
                                               │ rows for data │
                                               └───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Do index-only scans always skip reading the table? Commit to yes or no.

Common Belief:Index-only scans never read the table data; they always use only the index.

Tap to reveal reality

Quick: Can any index support index-only scans? Commit to yes or no.

Common Belief:Any index can be used for index-only scans regardless of its columns.

Tap to reveal reality

Quick: Do index-only scans return stale or incorrect data? Commit to yes or no.

Common Belief:Index-only scans might return outdated data because they skip the table.

Tap to reveal reality

Quick: Does vacuuming affect index-only scans? Commit to yes or no.

Common Belief:Vacuuming has no impact on index-only scans.

Tap to reveal reality

Expert Zone

1

Index-only scans depend heavily on the visibility map's accuracy, which means frequent vacuuming is critical for maintaining their performance benefits.

2

Covering indexes designed for index-only scans often include extra columns beyond the query's WHERE clause to enable more queries to use index-only scans.

3

PostgreSQL's MVCC model and HOT (Heap-Only Tuple) updates interact with index-only scans, sometimes delaying visibility map updates and affecting scan efficiency.

When NOT to use

Index-only scans are not suitable when queries request columns not present in the index or when the visibility map is outdated due to infrequent vacuuming. In such cases, traditional index scans or bitmap heap scans are better. Also, for write-heavy workloads where vacuuming can't keep up, relying on index-only scans may degrade performance.

Production Patterns

In production, DBAs create covering indexes tailored to frequent queries to maximize index-only scan usage. They schedule regular vacuuming to keep visibility maps fresh. Monitoring tools track index-only scan usage and visibility map effectiveness to tune maintenance. Complex queries may combine index-only scans with bitmap index scans for optimal performance.

Connections

Covering Indexes

Builds-on

Understanding index-only scans clarifies why covering indexes include all needed columns, enabling queries to be answered entirely from the index.

Multi-Version Concurrency Control (MVCC)

Depends on

Index-only scans rely on MVCC's visibility rules and the visibility map to ensure data correctness without reading the table.

Cache Optimization in Operating Systems

Similar pattern

Both index-only scans and OS cache optimizations aim to reduce expensive data access by using smaller, faster data structures to serve requests.

Common Pitfalls

#1Expecting index-only scans to always improve query speed regardless of index design.

Wrong approach:CREATE INDEX idx ON table(column1); -- expecting index-only scan on queries selecting column2

Correct approach:CREATE INDEX idx ON table(column1, column2); -- includes all needed columns for index-only scan

Root cause:Misunderstanding that index-only scans require all requested columns to be in the index.

#2Neglecting vacuuming, causing visibility map to be outdated and index-only scans to fall back to table reads.

Wrong approach:-- No regular vacuuming scheduled; relying on autovacuum only

Correct approach:VACUUM ANALYZE table; -- run regularly to update visibility map

Root cause:Not realizing vacuuming maintains the visibility map critical for index-only scan efficiency.

#3Assuming index-only scans can return data faster even when the visibility map is missing or stale.

Wrong approach:SELECT column FROM table WHERE indexed_column = 'value'; -- expecting index-only scan always

Correct approach:Understand that PostgreSQL may perform a regular index scan plus table read if visibility map is not set.

Root cause:Ignoring the role of the visibility map in enabling index-only scans.

Key Takeaways

Index-only scans let PostgreSQL answer queries using just the index, skipping the table for faster results.

They work only when all requested columns are in the index and the visibility map confirms data visibility.

The visibility map is maintained by vacuuming, making regular maintenance essential for performance.

Index-only scans never return incorrect data because PostgreSQL falls back to reading the table if needed.

Designing covering indexes and understanding MVCC are key to leveraging index-only scans effectively.