Bird
Raised Fist0
PostgreSQLquery~15 mins

Covering indexes with INCLUDE in PostgreSQL - Deep Dive

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Overview - Covering indexes with INCLUDE
What is it?
Covering indexes with INCLUDE is a technique in PostgreSQL where an index stores extra columns alongside the indexed columns. These extra columns are not used for searching but are stored to allow queries to get all needed data directly from the index. This helps avoid going back to the main table, making queries faster.
Why it matters
Without covering indexes, the database often has to look up the main table after finding matching rows in the index, which slows down queries. Covering indexes reduce this extra step, improving speed and efficiency, especially for read-heavy applications. This means faster responses and less work for the database.
Where it fits
Before learning covering indexes, you should understand basic indexing and how PostgreSQL uses indexes to speed up searches. After this, you can explore advanced indexing strategies, query optimization, and performance tuning.
Mental Model
Core Idea
A covering index stores extra columns so queries can get all needed data from the index alone, avoiding extra table lookups.
Think of it like...
Imagine a library index card that not only tells you where a book is but also includes a summary of the book's key points. You can decide if the book is useful without fetching it from the shelf.
┌───────────────┐
│ Index Key Col │
├───────────────┤
│ Included Cols │
└───────────────┘

Query uses index key to find rows,
then reads included columns directly,
no need to open main table.
Build-Up - 7 Steps
1
FoundationWhat is an Index in PostgreSQL
🤔
Concept: An index is a data structure that helps the database find rows faster by organizing data for quick search.
Think of an index like a book's table of contents. Instead of reading the whole book, you look at the contents to find the page you want. PostgreSQL creates indexes on columns to speed up queries that search or filter by those columns.
Result
Queries using indexed columns run faster because PostgreSQL can quickly locate matching rows.
Understanding indexes is key because they are the foundation for all query speed improvements in databases.
2
FoundationHow PostgreSQL Uses Indexes to Fetch Data
🤔
Concept: PostgreSQL uses indexes to find row locations, then fetches full data from the main table (heap).
When you run a query with a condition on an indexed column, PostgreSQL looks up the index to find matching row pointers. Then it goes to the main table to get the full row data needed for the query result.
Result
The query is faster than scanning the whole table but still requires accessing the main table after the index lookup.
Knowing that indexes point to rows but don't store all data explains why some queries still read the main table.
3
IntermediateWhat is a Covering Index with INCLUDE
🤔
Concept: A covering index stores extra columns inside the index to avoid fetching data from the main table.
PostgreSQL allows adding non-key columns to an index using INCLUDE. These columns are stored in the index but not used for searching. When a query needs these columns, PostgreSQL can get all data from the index alone.
Result
Queries that select indexed and included columns can run faster because they avoid accessing the main table.
Understanding that indexes can store extra data changes how you design indexes for performance.
4
IntermediateHow to Create an Index with INCLUDE Columns
🤔
Concept: You specify extra columns to include in the index using the INCLUDE clause in CREATE INDEX.
Example: CREATE INDEX idx_users_email ON users(email) INCLUDE (last_login, status); This index uses 'email' as the key and includes 'last_login' and 'status' columns for quick access.
Result
PostgreSQL creates an index that can answer queries filtering by email and selecting last_login and status without touching the main table.
Knowing the syntax and purpose of INCLUDE helps you build indexes tailored to your query needs.
5
IntermediateWhen Queries Benefit from Covering Indexes
🤔Before reading on: Do you think covering indexes help only with WHERE filters or also with SELECT columns? Commit to your answer.
Concept: Covering indexes help when queries filter by indexed columns and select included columns, avoiding extra table reads.
If a query filters by the indexed column and selects only the indexed and included columns, PostgreSQL can answer it fully from the index. For example: SELECT email, last_login FROM users WHERE email = 'abc@example.com'; This query uses the index with INCLUDE columns to avoid reading the main table.
Result
The query runs faster because it reads only the index, which is smaller and more efficient.
Understanding which queries benefit guides you to create indexes that improve real query performance.
6
AdvancedLimitations and Trade-offs of INCLUDE Columns
🤔Before reading on: Do you think including many columns in an index always improves performance? Commit to your answer.
Concept: Including columns increases index size and maintenance cost, so balance is needed.
While INCLUDE columns speed up reads, they make the index bigger and slower to update. Large indexes use more disk space and memory. Also, INCLUDE columns cannot be used for searching or sorting, only for covering queries.
Result
Indexes with many included columns may slow down writes and use more resources, so use INCLUDE wisely.
Knowing the costs prevents over-indexing and helps maintain overall database performance.
7
ExpertHow PostgreSQL Stores and Uses INCLUDE Columns Internally
🤔Before reading on: Do you think INCLUDE columns are stored in the same way as key columns inside the index? Commit to your answer.
Concept: INCLUDE columns are stored in the index leaf pages but are not part of the search key, affecting storage and usage.
PostgreSQL stores key columns in the index tree to organize and search data. INCLUDE columns are stored only in the leaf nodes, so they don't affect the tree structure or search order. This design allows quick access to included data without changing index navigation.
Result
Queries can fetch included columns efficiently, but these columns cannot be used for filtering or ordering.
Understanding this internal design explains why INCLUDE columns improve read performance without affecting index search behavior.
Under the Hood
PostgreSQL builds a B-tree or other index structure using key columns to organize data. INCLUDE columns are stored only in the leaf nodes of the index. When a query uses the index, PostgreSQL navigates the tree using key columns, then reads the leaf node to get both key and included columns. This avoids accessing the main table (heap) if all needed columns are in the index.
Why designed this way?
Storing INCLUDE columns only in leaf nodes keeps the index tree smaller and faster to search. It also avoids complicating the index structure since included columns are not used for searching or sorting. This design balances read speed improvements with manageable index size and update cost.
Index Structure:

┌───────────────┐
│   Root Node   │
│ (Key Columns) │
└──────┬────────┘
       │
┌──────▼────────┐
│ Internal Node │
│ (Key Columns) │
└──────┬────────┘
       │
┌──────▼─────────────────────────────┐
│ Leaf Node                          │
│ ┌───────────────┐ ┌───────────────┐│
│ │ Key Columns   │ │ INCLUDE Cols  ││
│ └───────────────┘ └───────────────┘│
└────────────────────────────────────┘
Myth Busters - 3 Common Misconceptions
Quick: Do you think INCLUDE columns can be used to filter query results like key columns? Commit to yes or no.
Common Belief:INCLUDE columns can be used in WHERE clauses to filter data because they are part of the index.
Tap to reveal reality
Reality:INCLUDE columns are stored only for covering queries and cannot be used to filter or sort results.
Why it matters:Using INCLUDE columns in WHERE clauses will not use the index for filtering, leading to slower queries and confusion.
Quick: Do you think adding many columns to INCLUDE always improves query speed? Commit to yes or no.
Common Belief:Adding more columns to INCLUDE always makes queries faster because more data is in the index.
Tap to reveal reality
Reality:Including many columns increases index size and slows down writes, which can hurt overall performance.
Why it matters:Overusing INCLUDE can cause slower inserts, updates, and more disk usage, harming database health.
Quick: Do you think a covering index completely replaces the need for the main table in all queries? Commit to yes or no.
Common Belief:A covering index means the main table is never accessed for queries using that index.
Tap to reveal reality
Reality:Covering indexes only avoid main table access if the query selects only indexed and included columns; otherwise, the main table is still needed.
Why it matters:Assuming the main table is never accessed can lead to wrong expectations about query performance.
Expert Zone
1
INCLUDE columns do not affect index uniqueness constraints, so unique indexes can include non-key columns without changing uniqueness rules.
2
PostgreSQL does not store INCLUDE columns in the index's internal nodes, which keeps tree traversal efficient even with large INCLUDE sets.
3
Covering indexes with INCLUDE are especially beneficial for read-heavy workloads with frequent queries selecting a small set of columns.
When NOT to use
Avoid using INCLUDE columns when queries need to filter or sort by those columns; instead, create separate indexes on those columns. Also, do not include large or frequently changing columns as INCLUDE because it increases index maintenance cost.
Production Patterns
In production, covering indexes are used to optimize common queries that select a few columns but filter by others, such as user lookups by email including last login time. DBAs monitor index size and write performance to balance benefits.
Connections
Materialized Views
Both store precomputed data to speed up queries, but materialized views store full query results while covering indexes store extra columns in indexes.
Understanding covering indexes helps grasp how databases optimize read performance by storing extra data close to the query path, similar to materialized views but at a lower storage cost.
Cache Memory in CPUs
Covering indexes reduce the need to access slower main table data, similar to how CPU caches store frequently used data to avoid slower memory access.
Knowing this connection highlights the principle of storing data closer to where it's needed to speed up access, a common pattern in computing.
Library Card Catalogs
Covering indexes are like library card catalogs that include summaries, allowing quick decisions without fetching the full book.
This connection shows how adding extra information to an index can reduce the need for expensive lookups, a principle used in many information systems.
Common Pitfalls
#1Including columns that are frequently updated, causing slow write performance.
Wrong approach:CREATE INDEX idx_orders_customer ON orders(customer_id) INCLUDE (order_status, updated_at, large_text_column);
Correct approach:CREATE INDEX idx_orders_customer ON orders(customer_id) INCLUDE (order_status);
Root cause:Misunderstanding that INCLUDE columns increase index size and update cost, especially for large or frequently changed columns.
#2Trying to filter or sort by INCLUDE columns expecting index usage.
Wrong approach:SELECT * FROM users WHERE last_login > '2024-01-01' ORDER BY status; -- last_login and status are only included columns
Correct approach:CREATE INDEX idx_users_last_login ON users(last_login); SELECT * FROM users WHERE last_login > '2024-01-01' ORDER BY status;
Root cause:Confusing INCLUDE columns as searchable keys rather than just stored data for covering.
#3Assuming all queries benefit from covering indexes and adding INCLUDE columns indiscriminately.
Wrong approach:CREATE INDEX idx_products_name ON products(name) INCLUDE (description, price, stock, supplier, category, weight, dimensions);
Correct approach:CREATE INDEX idx_products_name ON products(name) INCLUDE (price, stock);
Root cause:Lack of query analysis leading to over-indexing and bloated indexes.
Key Takeaways
Covering indexes with INCLUDE store extra columns in the index to avoid accessing the main table, speeding up queries.
INCLUDE columns cannot be used for filtering or sorting; they only help cover queries that select those columns.
Adding too many or large INCLUDE columns increases index size and slows down write operations.
Understanding when and how to use INCLUDE helps balance read performance with write cost.
PostgreSQL stores INCLUDE columns only in leaf nodes, keeping index search efficient while providing extra data.

Practice

(1/5)
1. What is the main purpose of using INCLUDE in a PostgreSQL index?
easy
A. To change the data type of indexed columns
B. To create a unique constraint on the indexed columns
C. To delete columns from the index
D. To add extra columns to the index for faster SELECT queries without searching on them

Solution

  1. Step 1: Understand the role of INCLUDE in indexes

    INCLUDE adds extra columns to the index that are not used for searching but can be returned in queries.
  2. Step 2: Identify the benefit of these extra columns

    These extra columns help avoid reading the main table, speeding up SELECT queries that need those columns.
  3. Final Answer:

    To add extra columns to the index for faster SELECT queries without searching on them -> Option D
  4. Quick Check:

    INCLUDE adds columns for SELECT speed [OK]
Hint: INCLUDE adds columns to speed SELECT, not for searching [OK]
Common Mistakes:
  • Thinking INCLUDE creates unique constraints
  • Believing INCLUDE removes columns
  • Assuming INCLUDE changes data types
2. Which of the following is the correct syntax to create a covering index on table users for column email and include last_login?
easy
A. CREATE INDEX idx_email ON users(email) INCLUDE (last_login);
B. CREATE INDEX idx_email ON users(email, last_login);
C. CREATE INDEX idx_email ON users INCLUDE (email, last_login);
D. CREATE INDEX idx_email ON users(email) WITH (last_login);

Solution

  1. Step 1: Recall the syntax for INCLUDE in PostgreSQL indexes

    The correct syntax is to specify indexed columns first, then use INCLUDE for extra columns.
  2. Step 2: Match the syntax to the options

    CREATE INDEX idx_email ON users(email) INCLUDE (last_login); correctly uses CREATE INDEX idx_email ON users(email) INCLUDE (last_login);
  3. Final Answer:

    CREATE INDEX idx_email ON users(email) INCLUDE (last_login); -> Option A
  4. Quick Check:

    Indexed columns first, INCLUDE for extras [OK]
Hint: Indexed columns before INCLUDE clause [OK]
Common Mistakes:
  • Putting all columns inside parentheses without INCLUDE
  • Using WITH instead of INCLUDE
  • Including columns in wrong order
3. Given the index CREATE INDEX idx_name ON employees(last_name) INCLUDE (first_name, department);, what will the query SELECT last_name, first_name FROM employees WHERE last_name = 'Smith'; most likely do?
medium
A. Use the index but still access the table to get first_name
B. Use the index to find rows and return both last_name and first_name without accessing the table
C. Scan the whole table because first_name is not indexed
D. Return an error because first_name is not indexed

Solution

  1. Step 1: Understand what INCLUDE columns do in the index

    INCLUDE columns are stored in the index to avoid accessing the main table for those columns.
  2. Step 2: Analyze the query and index usage

    The query filters on last_name (indexed) and selects first_name (included). The index covers both, so no table access needed.
  3. Final Answer:

    Use the index to find rows and return both last_name and first_name without accessing the table -> Option B
  4. Quick Check:

    INCLUDE columns avoid table access [OK]
Hint: INCLUDE columns can be returned without table access [OK]
Common Mistakes:
  • Assuming INCLUDE columns are not stored in the index
  • Thinking table scan is always needed
  • Confusing INCLUDE with indexed columns
4. You wrote this index: CREATE INDEX idx_order ON orders(order_date) INCLUDE (customer_id; but get a syntax error. What is the problem?
medium
A. INCLUDE cannot be used with order_date
B. You must list customer_id before order_date
C. Missing closing parenthesis after customer_id
D. INCLUDE requires at least two columns

Solution

  1. Step 1: Check the syntax of the CREATE INDEX statement

    The statement has an opening parenthesis after INCLUDE but no closing parenthesis.
  2. Step 2: Identify the syntax error

    Missing closing parenthesis causes the syntax error.
  3. Final Answer:

    Missing closing parenthesis after customer_id -> Option C
  4. Quick Check:

    Parentheses must be balanced [OK]
Hint: Check parentheses carefully in INCLUDE clause [OK]
Common Mistakes:
  • Forgetting closing parenthesis
  • Misordering columns
  • Thinking INCLUDE needs multiple columns
5. You want to speed up this query: SELECT product_id, price, stock FROM products WHERE product_id = 123; by creating a covering index. Which index is best?
hard
A. CREATE INDEX idx_product ON products(product_id) INCLUDE (price, stock);
B. CREATE INDEX idx_product ON products(price, stock) INCLUDE (product_id);
C. CREATE INDEX idx_product ON products(product_id, price, stock);
D. CREATE INDEX idx_product ON products(product_id);

Solution

  1. Step 1: Identify the filtering and selected columns in the query

    The query filters on product_id and selects price and stock.
  2. Step 2: Choose an index that filters on product_id and includes price and stock

    CREATE INDEX idx_product ON products(product_id) INCLUDE (price, stock); indexes product_id and includes price and stock, covering the query efficiently.
  3. Step 3: Compare other options

    CREATE INDEX idx_product ON products(price, stock) INCLUDE (product_id); indexes price and stock, not filtering column; CREATE INDEX idx_product ON products(product_id, price, stock); indexes all columns but includes unnecessary columns in index key; CREATE INDEX idx_product ON products(product_id); lacks included columns, so table access needed.
  4. Final Answer:

    CREATE INDEX idx_product ON products(product_id) INCLUDE (price, stock); -> Option A
  5. Quick Check:

    Filter column indexed, others included [OK]
Hint: Index filter column, INCLUDE others for covering [OK]
Common Mistakes:
  • Including filter columns instead of indexing them
  • Indexing all columns as keys unnecessarily
  • Not including selected columns causing table access