Correlated subqueries in MySQL - Time & Space Complexity
When using correlated subqueries, the database runs a small query for each row of the main query.
We want to understand how the total work grows as the number of rows increases.
Analyze the time complexity of the following code snippet.
SELECT e1.employee_id, e1.salary
FROM employees e1
WHERE e1.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department_id = e1.department_id
);
This query finds employees whose salary is above the average salary in their department.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: For each employee row, the subquery calculates the average salary of that employee's department.
- How many times: The subquery runs once for every employee in the main query.
As the number of employees grows, the subquery runs more times, each time scanning employees in a department.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 times scanning small groups |
| 100 | About 100 times scanning department groups |
| 1000 | About 1000 times scanning department groups |
Pattern observation: The total work grows roughly proportional to the number of employees times the average department size.
Time Complexity: O(n * m)
This means the work grows with the number of employees (n) times the average number of employees per department (m).
[X] Wrong: "The subquery runs only once, so the query is just O(n)."
[OK] Correct: The subquery depends on each row of the main query, so it runs repeatedly, increasing total work.
Understanding how correlated subqueries scale helps you write efficient queries and explain their performance clearly.
"What if the subquery was replaced by a join with grouping? How would the time complexity change?"