Introduction
When databases grow large, finding data quickly becomes a challenge. Indexes help speed up searches, but choosing the right index is important to keep the database fast and efficient.
Jump into concepts and practice - no test required
Imagine a large library where you want to find books quickly. If you know which shelves hold books on your favorite topics and the books are well organized by author and title, you find your book faster. But if the shelves are messy or you have to check every book, it takes longer.
┌───────────────────────────────┐ │ Query Patterns │ ├───────────────┬───────────────┤ │ High Selectivity Columns │ ├───────────────┼───────────────┤ │ Composite Indexes │ ├───────────────┼───────────────┤ │ Covering Indexes │ ├───────────────┼───────────────┤ │ Avoid Low Selectivity Columns │ ├───────────────┼───────────────┤ │ Balance Read/Write Performance │ └───────────────────────────────┘
import sqlite3 conn = sqlite3.connect(':memory:') cur = conn.cursor() # Create a sample table cur.execute('CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary INTEGER)') # Insert sample data cur.executemany('INSERT INTO employees VALUES (?, ?, ?, ?)', [ (1, 'Alice', 'Sales', 70000), (2, 'Bob', 'HR', 50000), (3, 'Charlie', 'Sales', 60000), (4, 'Diana', 'IT', 80000), (5, 'Eve', 'IT', 75000) ]) # Create an index on department (low selectivity) and salary cur.execute('CREATE INDEX idx_dept_salary ON employees(department, salary)') # Query using the index cur.execute('SELECT name FROM employees WHERE department = ? AND salary > ?', ('IT', 70000)) for row in cur.fetchall(): print(row[0])
idx_name on the column last_name of the table employees?orders with columns order_id, customer_id, and order_date. If you create an index on customer_id, what will be the expected effect when running this query?SELECT * FROM orders WHERE customer_id = 123;email column of the users table, but after inserting many new users, the database performance for inserts slowed down significantly. What is the most likely cause?sales table with columns sale_id, product_id, sale_date, and region. You often run queries filtering by product_id and region together. Which index strategy is best to improve query speed without hurting insert performance too much?