Bird
Raised Fist0
DBMS Theoryknowledge~6 mins

Index selection guidelines in DBMS Theory - Full Explanation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
When databases grow large, finding data quickly becomes a challenge. Indexes help speed up searches, but choosing the right index is important to keep the database fast and efficient.
Explanation
Understand Query Patterns
Look at how the database is used. Identify which columns are often searched, filtered, or sorted. Indexes work best when they match these common query patterns.
Indexes should be chosen based on the most frequent and important queries.
Choose Columns with High Selectivity
Selectivity means how many unique values a column has compared to total rows. Columns with many unique values make better index candidates because they narrow down searches more effectively.
High selectivity columns improve index efficiency by reducing search results quickly.
Consider Composite Indexes
Sometimes queries filter on multiple columns together. Creating an index that covers these columns in the right order can speed up these combined searches.
Composite indexes optimize queries filtering on multiple columns simultaneously.
Balance Read and Write Performance
Indexes speed up reading data but slow down writing because the index must be updated. Choose indexes that improve read speed without causing too much write delay.
Good index selection balances faster reads with acceptable write performance.
Avoid Indexing Low-Selectivity Columns
Columns with few unique values, like boolean flags, usually do not benefit from indexing because they do not reduce search results much.
Indexing low-selectivity columns often wastes resources without improving speed.
Use Covering Indexes When Possible
A covering index includes all columns a query needs, so the database can answer the query using only the index without looking at the main data.
Covering indexes can greatly speed up queries by avoiding extra data lookups.
Real World Analogy

Imagine a large library where you want to find books quickly. If you know which shelves hold books on your favorite topics and the books are well organized by author and title, you find your book faster. But if the shelves are messy or you have to check every book, it takes longer.

Understand Query Patterns → Knowing which topics you search for most often in the library
Choose Columns with High Selectivity → Looking for books by a specific author rather than just any book
Consider Composite Indexes → Finding books sorted by author and then by title on the shelf
Balance Read and Write Performance → Organizing shelves to help readers find books quickly without making it hard to add new books
Avoid Indexing Low-Selectivity Columns → Not organizing books by color of cover because it doesn't help find books
Use Covering Indexes When Possible → Having a catalog card that lists all details you need so you don't have to look inside the book
Diagram
Diagram
┌───────────────────────────────┐
│         Query Patterns         │
├───────────────┬───────────────┤
│ High Selectivity Columns       │
├───────────────┼───────────────┤
│ Composite Indexes              │
├───────────────┼───────────────┤
│ Covering Indexes              │
├───────────────┼───────────────┤
│ Avoid Low Selectivity Columns  │
├───────────────┼───────────────┤
│ Balance Read/Write Performance │
└───────────────────────────────┘
This diagram shows the main guidelines for selecting indexes arranged as key considerations.
Key Facts
IndexA data structure that speeds up data retrieval in a database.
SelectivityThe uniqueness of values in a column compared to total rows.
Composite IndexAn index on multiple columns used together in queries.
Covering IndexAn index that contains all columns needed to satisfy a query.
Read-Write TradeoffIndexes speed up reads but slow down writes due to maintenance.
Code Example
DBMS Theory
import sqlite3

conn = sqlite3.connect(':memory:')
cur = conn.cursor()

# Create a sample table
cur.execute('CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary INTEGER)')

# Insert sample data
cur.executemany('INSERT INTO employees VALUES (?, ?, ?, ?)', [
    (1, 'Alice', 'Sales', 70000),
    (2, 'Bob', 'HR', 50000),
    (3, 'Charlie', 'Sales', 60000),
    (4, 'Diana', 'IT', 80000),
    (5, 'Eve', 'IT', 75000)
])

# Create an index on department (low selectivity) and salary
cur.execute('CREATE INDEX idx_dept_salary ON employees(department, salary)')

# Query using the index
cur.execute('SELECT name FROM employees WHERE department = ? AND salary > ?', ('IT', 70000))
for row in cur.fetchall():
    print(row[0])
OutputSuccess
Common Confusions
Believing that indexing every column always improves performance
Believing that indexing every column always improves performance Indexing every column can slow down writes and use extra space; only index columns that improve important queries.
Assuming low-selectivity columns are good index candidates
Assuming low-selectivity columns are good index candidates Columns with few unique values rarely help indexes filter data effectively and usually should not be indexed.
Thinking composite indexes work regardless of column order
Thinking composite indexes work regardless of column order The order of columns in a composite index matters because queries must match the index order to benefit.
Summary
Indexes speed up data searches but must be chosen based on how the database is used.
Columns with many unique values and those used together in queries make good index candidates.
Good index selection balances faster reads with acceptable impact on writes.

Practice

(1/5)
1. Which of the following is the best reason to create an index on a database column?
easy
A. To make data entry faster
B. To reduce the size of the database
C. To speed up searches on that column
D. To prevent data duplication

Solution

  1. Step 1: Understand the purpose of an index

    An index is like a shortcut that helps the database find rows faster when searching by that column.
  2. Step 2: Compare options with index purpose

    Only speeding up searches matches the main use of indexes; other options do not relate to indexing benefits.
  3. Final Answer:

    To speed up searches on that column -> Option C
  4. Quick Check:

    Indexes improve search speed = A [OK]
Hint: Indexes speed up searches, not data entry or size [OK]
Common Mistakes:
  • Thinking indexes reduce database size
  • Believing indexes speed up data insertion
  • Confusing indexes with uniqueness constraints
2. Which of the following is the correct SQL syntax to create an index named idx_name on the column last_name of the table employees?
easy
A. CREATE INDEX idx_name ON employees (last_name);
B. CREATE idx_name INDEX ON employees (last_name);
C. INDEX CREATE idx_name ON employees (last_name);
D. CREATE INDEX ON employees idx_name (last_name);

Solution

  1. Step 1: Recall standard SQL syntax for creating an index

    The correct syntax is: CREATE INDEX index_name ON table_name (column_name);
  2. Step 2: Match options to syntax

    CREATE INDEX idx_name ON employees (last_name); matches the correct syntax exactly; others have wrong order or keywords.
  3. Final Answer:

    CREATE INDEX idx_name ON employees (last_name); -> Option A
  4. Quick Check:

    Standard SQL index creation = C [OK]
Hint: Remember: CREATE INDEX name ON table (column) [OK]
Common Mistakes:
  • Swapping keywords order
  • Omitting the INDEX keyword
  • Placing index name after table name incorrectly
3. Consider a table orders with columns order_id, customer_id, and order_date. If you create an index on customer_id, what will be the expected effect when running this query?
SELECT * FROM orders WHERE customer_id = 123;
medium
A. The query will run slower because indexes slow down searches
B. The query will cause an error due to the index
C. The query will return no results because indexes filter data
D. The query will run faster because the index helps find matching rows quickly

Solution

  1. Step 1: Understand index effect on search queries

    An index on customer_id allows the database to quickly locate rows where customer_id = 123 without scanning the whole table.
  2. Step 2: Analyze query behavior with index

    The query uses a WHERE condition on customer_id, so the index speeds up the search, making the query faster.
  3. Final Answer:

    The query will run faster because the index helps find matching rows quickly -> Option D
  4. Quick Check:

    Index speeds up WHERE searches = B [OK]
Hint: Indexes speed up WHERE filters on indexed columns [OK]
Common Mistakes:
  • Thinking indexes slow down searches
  • Believing indexes filter out data
  • Assuming indexes cause errors in queries
4. You created an index on the email column of the users table, but after inserting many new users, the database performance for inserts slowed down significantly. What is the most likely cause?
medium
A. The index was created on the wrong column
B. Indexes slow down data insertion because they must update with each insert
C. The database does not support indexes on email columns
D. The table is too small for indexes to help

Solution

  1. Step 1: Understand index impact on data changes

    Indexes improve search speed but add overhead during inserts because the index structure must be updated for each new row.
  2. Step 2: Analyze why inserts slow down

    Since the index updates on every insert, many inserts cause slower performance, which matches Indexes slow down data insertion because they must update with each insert.
  3. Final Answer:

    Indexes slow down data insertion because they must update with each insert -> Option B
  4. Quick Check:

    Indexes slow inserts due to update overhead = A [OK]
Hint: Indexes slow inserts due to update work [OK]
Common Mistakes:
  • Blaming wrong column choice for insert slowdown
  • Thinking indexes cause errors on email columns
  • Assuming small tables don't need indexes
5. You have a large sales table with columns sale_id, product_id, sale_date, and region. You often run queries filtering by product_id and region together. Which index strategy is best to improve query speed without hurting insert performance too much?
hard
A. Create a composite index on (product_id, region)
B. Create separate indexes on product_id and region
C. Create an index only on sale_date
D. Do not create any indexes to keep inserts fast

Solution

  1. Step 1: Analyze query filter columns

    Queries filter by both product_id and region together, so a composite index on both columns helps the database find matching rows efficiently.
  2. Step 2: Compare index strategies

    Separate indexes may help but are less efficient for combined filters; indexing sale_date is irrelevant here; no index hurts query speed.
  3. Final Answer:

    Create a composite index on (product_id, region) -> Option A
  4. Quick Check:

    Composite index matches multi-column filters = D [OK]
Hint: Use composite index for multi-column filters [OK]
Common Mistakes:
  • Creating separate indexes instead of composite
  • Indexing unrelated columns
  • Avoiding indexes and hurting query speed