0
0
PostgreSQLquery~15 mins

Covering indexes with INCLUDE in PostgreSQL - Deep Dive

Choose your learning style9 modes available
Overview - Covering indexes with INCLUDE
What is it?
Covering indexes with INCLUDE is a technique in PostgreSQL where an index stores extra columns alongside the indexed columns. These extra columns are not used for searching but are stored to allow queries to get all needed data directly from the index. This helps avoid going back to the main table, making queries faster.
Why it matters
Without covering indexes, the database often has to look up the main table after finding matching rows in the index, which slows down queries. Covering indexes reduce this extra step, improving speed and efficiency, especially for read-heavy applications. This means faster responses and less work for the database.
Where it fits
Before learning covering indexes, you should understand basic indexing and how PostgreSQL uses indexes to speed up searches. After this, you can explore advanced indexing strategies, query optimization, and performance tuning.
Mental Model
Core Idea
A covering index stores extra columns so queries can get all needed data from the index alone, avoiding extra table lookups.
Think of it like...
Imagine a library index card that not only tells you where a book is but also includes a summary of the book's key points. You can decide if the book is useful without fetching it from the shelf.
┌───────────────┐
│ Index Key Col │
├───────────────┤
│ Included Cols │
└───────────────┘

Query uses index key to find rows,
then reads included columns directly,
no need to open main table.
Build-Up - 7 Steps
1
FoundationWhat is an Index in PostgreSQL
🤔
Concept: An index is a data structure that helps the database find rows faster by organizing data for quick search.
Think of an index like a book's table of contents. Instead of reading the whole book, you look at the contents to find the page you want. PostgreSQL creates indexes on columns to speed up queries that search or filter by those columns.
Result
Queries using indexed columns run faster because PostgreSQL can quickly locate matching rows.
Understanding indexes is key because they are the foundation for all query speed improvements in databases.
2
FoundationHow PostgreSQL Uses Indexes to Fetch Data
🤔
Concept: PostgreSQL uses indexes to find row locations, then fetches full data from the main table (heap).
When you run a query with a condition on an indexed column, PostgreSQL looks up the index to find matching row pointers. Then it goes to the main table to get the full row data needed for the query result.
Result
The query is faster than scanning the whole table but still requires accessing the main table after the index lookup.
Knowing that indexes point to rows but don't store all data explains why some queries still read the main table.
3
IntermediateWhat is a Covering Index with INCLUDE
🤔
Concept: A covering index stores extra columns inside the index to avoid fetching data from the main table.
PostgreSQL allows adding non-key columns to an index using INCLUDE. These columns are stored in the index but not used for searching. When a query needs these columns, PostgreSQL can get all data from the index alone.
Result
Queries that select indexed and included columns can run faster because they avoid accessing the main table.
Understanding that indexes can store extra data changes how you design indexes for performance.
4
IntermediateHow to Create an Index with INCLUDE Columns
🤔
Concept: You specify extra columns to include in the index using the INCLUDE clause in CREATE INDEX.
Example: CREATE INDEX idx_users_email ON users(email) INCLUDE (last_login, status); This index uses 'email' as the key and includes 'last_login' and 'status' columns for quick access.
Result
PostgreSQL creates an index that can answer queries filtering by email and selecting last_login and status without touching the main table.
Knowing the syntax and purpose of INCLUDE helps you build indexes tailored to your query needs.
5
IntermediateWhen Queries Benefit from Covering Indexes
🤔Before reading on: Do you think covering indexes help only with WHERE filters or also with SELECT columns? Commit to your answer.
Concept: Covering indexes help when queries filter by indexed columns and select included columns, avoiding extra table reads.
If a query filters by the indexed column and selects only the indexed and included columns, PostgreSQL can answer it fully from the index. For example: SELECT email, last_login FROM users WHERE email = 'abc@example.com'; This query uses the index with INCLUDE columns to avoid reading the main table.
Result
The query runs faster because it reads only the index, which is smaller and more efficient.
Understanding which queries benefit guides you to create indexes that improve real query performance.
6
AdvancedLimitations and Trade-offs of INCLUDE Columns
🤔Before reading on: Do you think including many columns in an index always improves performance? Commit to your answer.
Concept: Including columns increases index size and maintenance cost, so balance is needed.
While INCLUDE columns speed up reads, they make the index bigger and slower to update. Large indexes use more disk space and memory. Also, INCLUDE columns cannot be used for searching or sorting, only for covering queries.
Result
Indexes with many included columns may slow down writes and use more resources, so use INCLUDE wisely.
Knowing the costs prevents over-indexing and helps maintain overall database performance.
7
ExpertHow PostgreSQL Stores and Uses INCLUDE Columns Internally
🤔Before reading on: Do you think INCLUDE columns are stored in the same way as key columns inside the index? Commit to your answer.
Concept: INCLUDE columns are stored in the index leaf pages but are not part of the search key, affecting storage and usage.
PostgreSQL stores key columns in the index tree to organize and search data. INCLUDE columns are stored only in the leaf nodes, so they don't affect the tree structure or search order. This design allows quick access to included data without changing index navigation.
Result
Queries can fetch included columns efficiently, but these columns cannot be used for filtering or ordering.
Understanding this internal design explains why INCLUDE columns improve read performance without affecting index search behavior.
Under the Hood
PostgreSQL builds a B-tree or other index structure using key columns to organize data. INCLUDE columns are stored only in the leaf nodes of the index. When a query uses the index, PostgreSQL navigates the tree using key columns, then reads the leaf node to get both key and included columns. This avoids accessing the main table (heap) if all needed columns are in the index.
Why designed this way?
Storing INCLUDE columns only in leaf nodes keeps the index tree smaller and faster to search. It also avoids complicating the index structure since included columns are not used for searching or sorting. This design balances read speed improvements with manageable index size and update cost.
Index Structure:

┌───────────────┐
│   Root Node   │
│ (Key Columns) │
└──────┬────────┘
       │
┌──────▼────────┐
│ Internal Node │
│ (Key Columns) │
└──────┬────────┘
       │
┌──────▼─────────────────────────────┐
│ Leaf Node                          │
│ ┌───────────────┐ ┌───────────────┐│
│ │ Key Columns   │ │ INCLUDE Cols  ││
│ └───────────────┘ └───────────────┘│
└────────────────────────────────────┘
Myth Busters - 3 Common Misconceptions
Quick: Do you think INCLUDE columns can be used to filter query results like key columns? Commit to yes or no.
Common Belief:INCLUDE columns can be used in WHERE clauses to filter data because they are part of the index.
Tap to reveal reality
Reality:INCLUDE columns are stored only for covering queries and cannot be used to filter or sort results.
Why it matters:Using INCLUDE columns in WHERE clauses will not use the index for filtering, leading to slower queries and confusion.
Quick: Do you think adding many columns to INCLUDE always improves query speed? Commit to yes or no.
Common Belief:Adding more columns to INCLUDE always makes queries faster because more data is in the index.
Tap to reveal reality
Reality:Including many columns increases index size and slows down writes, which can hurt overall performance.
Why it matters:Overusing INCLUDE can cause slower inserts, updates, and more disk usage, harming database health.
Quick: Do you think a covering index completely replaces the need for the main table in all queries? Commit to yes or no.
Common Belief:A covering index means the main table is never accessed for queries using that index.
Tap to reveal reality
Reality:Covering indexes only avoid main table access if the query selects only indexed and included columns; otherwise, the main table is still needed.
Why it matters:Assuming the main table is never accessed can lead to wrong expectations about query performance.
Expert Zone
1
INCLUDE columns do not affect index uniqueness constraints, so unique indexes can include non-key columns without changing uniqueness rules.
2
PostgreSQL does not store INCLUDE columns in the index's internal nodes, which keeps tree traversal efficient even with large INCLUDE sets.
3
Covering indexes with INCLUDE are especially beneficial for read-heavy workloads with frequent queries selecting a small set of columns.
When NOT to use
Avoid using INCLUDE columns when queries need to filter or sort by those columns; instead, create separate indexes on those columns. Also, do not include large or frequently changing columns as INCLUDE because it increases index maintenance cost.
Production Patterns
In production, covering indexes are used to optimize common queries that select a few columns but filter by others, such as user lookups by email including last login time. DBAs monitor index size and write performance to balance benefits.
Connections
Materialized Views
Both store precomputed data to speed up queries, but materialized views store full query results while covering indexes store extra columns in indexes.
Understanding covering indexes helps grasp how databases optimize read performance by storing extra data close to the query path, similar to materialized views but at a lower storage cost.
Cache Memory in CPUs
Covering indexes reduce the need to access slower main table data, similar to how CPU caches store frequently used data to avoid slower memory access.
Knowing this connection highlights the principle of storing data closer to where it's needed to speed up access, a common pattern in computing.
Library Card Catalogs
Covering indexes are like library card catalogs that include summaries, allowing quick decisions without fetching the full book.
This connection shows how adding extra information to an index can reduce the need for expensive lookups, a principle used in many information systems.
Common Pitfalls
#1Including columns that are frequently updated, causing slow write performance.
Wrong approach:CREATE INDEX idx_orders_customer ON orders(customer_id) INCLUDE (order_status, updated_at, large_text_column);
Correct approach:CREATE INDEX idx_orders_customer ON orders(customer_id) INCLUDE (order_status);
Root cause:Misunderstanding that INCLUDE columns increase index size and update cost, especially for large or frequently changed columns.
#2Trying to filter or sort by INCLUDE columns expecting index usage.
Wrong approach:SELECT * FROM users WHERE last_login > '2024-01-01' ORDER BY status; -- last_login and status are only included columns
Correct approach:CREATE INDEX idx_users_last_login ON users(last_login); SELECT * FROM users WHERE last_login > '2024-01-01' ORDER BY status;
Root cause:Confusing INCLUDE columns as searchable keys rather than just stored data for covering.
#3Assuming all queries benefit from covering indexes and adding INCLUDE columns indiscriminately.
Wrong approach:CREATE INDEX idx_products_name ON products(name) INCLUDE (description, price, stock, supplier, category, weight, dimensions);
Correct approach:CREATE INDEX idx_products_name ON products(name) INCLUDE (price, stock);
Root cause:Lack of query analysis leading to over-indexing and bloated indexes.
Key Takeaways
Covering indexes with INCLUDE store extra columns in the index to avoid accessing the main table, speeding up queries.
INCLUDE columns cannot be used for filtering or sorting; they only help cover queries that select those columns.
Adding too many or large INCLUDE columns increases index size and slows down write operations.
Understanding when and how to use INCLUDE helps balance read performance with write cost.
PostgreSQL stores INCLUDE columns only in leaf nodes, keeping index search efficient while providing extra data.