0
0
DBMS Theoryknowledge~6 mins

Index selection guidelines in DBMS Theory - Full Explanation

Choose your learning style9 modes available
Introduction
When databases grow large, finding data quickly becomes a challenge. Indexes help speed up searches, but choosing the right index is important to keep the database fast and efficient.
Explanation
Understand Query Patterns
Look at how the database is used. Identify which columns are often searched, filtered, or sorted. Indexes work best when they match these common query patterns.
Indexes should be chosen based on the most frequent and important queries.
Choose Columns with High Selectivity
Selectivity means how many unique values a column has compared to total rows. Columns with many unique values make better index candidates because they narrow down searches more effectively.
High selectivity columns improve index efficiency by reducing search results quickly.
Consider Composite Indexes
Sometimes queries filter on multiple columns together. Creating an index that covers these columns in the right order can speed up these combined searches.
Composite indexes optimize queries filtering on multiple columns simultaneously.
Balance Read and Write Performance
Indexes speed up reading data but slow down writing because the index must be updated. Choose indexes that improve read speed without causing too much write delay.
Good index selection balances faster reads with acceptable write performance.
Avoid Indexing Low-Selectivity Columns
Columns with few unique values, like boolean flags, usually do not benefit from indexing because they do not reduce search results much.
Indexing low-selectivity columns often wastes resources without improving speed.
Use Covering Indexes When Possible
A covering index includes all columns a query needs, so the database can answer the query using only the index without looking at the main data.
Covering indexes can greatly speed up queries by avoiding extra data lookups.
Real World Analogy

Imagine a large library where you want to find books quickly. If you know which shelves hold books on your favorite topics and the books are well organized by author and title, you find your book faster. But if the shelves are messy or you have to check every book, it takes longer.

Understand Query Patterns → Knowing which topics you search for most often in the library
Choose Columns with High Selectivity → Looking for books by a specific author rather than just any book
Consider Composite Indexes → Finding books sorted by author and then by title on the shelf
Balance Read and Write Performance → Organizing shelves to help readers find books quickly without making it hard to add new books
Avoid Indexing Low-Selectivity Columns → Not organizing books by color of cover because it doesn't help find books
Use Covering Indexes When Possible → Having a catalog card that lists all details you need so you don't have to look inside the book
Diagram
Diagram
┌───────────────────────────────┐
│         Query Patterns         │
├───────────────┬───────────────┤
│ High Selectivity Columns       │
├───────────────┼───────────────┤
│ Composite Indexes              │
├───────────────┼───────────────┤
│ Covering Indexes              │
├───────────────┼───────────────┤
│ Avoid Low Selectivity Columns  │
├───────────────┼───────────────┤
│ Balance Read/Write Performance │
└───────────────────────────────┘
This diagram shows the main guidelines for selecting indexes arranged as key considerations.
Key Facts
IndexA data structure that speeds up data retrieval in a database.
SelectivityThe uniqueness of values in a column compared to total rows.
Composite IndexAn index on multiple columns used together in queries.
Covering IndexAn index that contains all columns needed to satisfy a query.
Read-Write TradeoffIndexes speed up reads but slow down writes due to maintenance.
Code Example
DBMS Theory
import sqlite3

conn = sqlite3.connect(':memory:')
cur = conn.cursor()

# Create a sample table
cur.execute('CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary INTEGER)')

# Insert sample data
cur.executemany('INSERT INTO employees VALUES (?, ?, ?, ?)', [
    (1, 'Alice', 'Sales', 70000),
    (2, 'Bob', 'HR', 50000),
    (3, 'Charlie', 'Sales', 60000),
    (4, 'Diana', 'IT', 80000),
    (5, 'Eve', 'IT', 75000)
])

# Create an index on department (low selectivity) and salary
cur.execute('CREATE INDEX idx_dept_salary ON employees(department, salary)')

# Query using the index
cur.execute('SELECT name FROM employees WHERE department = ? AND salary > ?', ('IT', 70000))
for row in cur.fetchall():
    print(row[0])
OutputSuccess
Common Confusions
Believing that indexing every column always improves performance
Believing that indexing every column always improves performance Indexing every column can slow down writes and use extra space; only index columns that improve important queries.
Assuming low-selectivity columns are good index candidates
Assuming low-selectivity columns are good index candidates Columns with few unique values rarely help indexes filter data effectively and usually should not be indexed.
Thinking composite indexes work regardless of column order
Thinking composite indexes work regardless of column order The order of columns in a composite index matters because queries must match the index order to benefit.
Summary
Indexes speed up data searches but must be chosen based on how the database is used.
Columns with many unique values and those used together in queries make good index candidates.
Good index selection balances faster reads with acceptable impact on writes.