Query optimization strategies in DBMS Theory - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When running database queries, the time it takes can change a lot depending on how the query is written and how the database handles it.
We want to understand how the work done grows as the data size grows and how optimization helps.
Analyze the time complexity of the following SQL query with and without optimization.
SELECT employee.name, department.name
FROM employee
JOIN department ON employee.dept_id = department.id
WHERE employee.salary > 50000;
This query joins two tables and filters employees with salary over 50000.
Look at what repeats when the query runs.
- Primary operation: Scanning rows in employee and department tables.
- How many times: Each row in employee is checked, and for each matching employee, the department is looked up.
As the number of employees and departments grows, the work grows too.
| Input Size (n employees) | Approx. Operations |
|---|---|
| 10 | About 10 employee checks + department lookups |
| 100 | About 100 employee checks + department lookups |
| 1000 | About 1000 employee checks + department lookups |
Pattern observation: The work grows roughly in direct proportion to the number of employees.
Time Complexity: O(n)
This means the time to run the query grows roughly in a straight line as the number of employees increases.
[X] Wrong: "Adding indexes always makes queries run instantly regardless of data size."
[OK] Correct: Indexes help, but the query still needs to check matching rows, so time still grows with data size, just more slowly.
Understanding how query time grows helps you write better queries and explain your thinking clearly in interviews.
"What if we add an index on employee.salary? How would the time complexity change?"
Practice
Solution
Step 1: Understand the purpose of query optimization
Query optimization aims to improve how quickly and efficiently data is retrieved from a database.Step 2: Compare options with the goal
Only To make data retrieval faster and more efficient matches this goal; others describe unrelated tasks.Final Answer:
To make data retrieval faster and more efficient -> Option AQuick Check:
Query optimization = faster data retrieval [OK]
- Confusing optimization with data deletion
- Thinking optimization increases database size
- Mixing security tasks with optimization
employee_id in the table employees?Solution
Step 1: Recall correct SQL syntax for creating an index
The standard syntax isCREATE INDEX index_name ON table_name(column_name);Step 2: Match options with correct syntax
Only CREATE INDEX idx_emp_id ON employees(employee_id); matches the correct syntax exactly.Final Answer:
CREATE INDEX idx_emp_id ON employees(employee_id); -> Option DQuick Check:
CREATE INDEX ... ON ... (column) [OK]
- Using wrong keywords like MAKE or FOR
- Placing table name before INDEX keyword
- Incorrect order of clauses
SELECT * FROM orders WHERE customer_id = 123;If there is an index on
customer_id, what is the expected effect on query performance?Solution
Step 1: Understand the role of indexes in queries
Indexes help the database find rows matching conditions faster without scanning the entire table.Step 2: Analyze the effect of an index on
Since the query filters bycustomer_idcustomer_id, the index speeds up locating those rows.Final Answer:
The query will run faster by quickly locating matching rows -> Option CQuick Check:
Index on filter column = faster query [OK]
- Thinking indexes slow down SELECT queries
- Assuming indexes cause wrong results
- Believing indexes are always ignored
SELECT * FROM products WHERE price > 100 AND price < 50;What is the main issue affecting query optimization here?
Solution
Step 1: Analyze the WHERE clause conditions
The conditionsprice > 100andprice < 50cannot be true at the same time.Step 2: Understand impact on query results and optimization
This conflict means no rows will match, so the query returns empty results, wasting resources.Final Answer:
The WHERE clause has conflicting conditions making the query return no rows -> Option BQuick Check:
Conflicting WHERE conditions = no results [OK]
- Assuming SELECT * is wrong syntax
- Ignoring logical conflicts in conditions
- Thinking missing index causes no results
sales with columns sale_date, region, and amount. You want to optimize this query:SELECT region, SUM(amount) FROM sales WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31' GROUP BY region;Which strategy will best improve performance?
Solution
Step 1: Identify columns used in WHERE and GROUP BY clauses
The query filters bysale_dateand groups byregion.Step 2: Choose an index that supports both filtering and grouping
A composite index on(sale_date, region)helps quickly find rows in the date range and group them efficiently.Final Answer:
Create a composite index on (sale_date, region) -> Option AQuick Check:
Composite index on filter and group columns = better performance [OK]
- Indexing only the aggregated column
- Removing GROUP BY loses needed grouping
- Using SELECT * increases data load unnecessarily
