Correlated subquery execution model in SQL - Time & Space Complexity
When using correlated subqueries, the database runs a smaller query for each row of the main query.
We want to understand how this repeated work grows as the data gets bigger.
Analyze the time complexity of the following SQL query with a correlated subquery.
SELECT e.employee_id, e.name
FROM employees e
WHERE e.salary > (
SELECT AVG(salary)
FROM employees
WHERE department_id = e.department_id
);
This query finds employees whose salary is above the average salary in their own department.
Look for repeated work inside the query.
- Primary operation: The subquery runs once for each employee row.
- How many times: As many times as there are employees (n times).
Each employee triggers a subquery that scans employees in their department.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 times scanning small groups |
| 100 | About 100 times scanning groups of employees |
| 1000 | About 1000 times scanning groups, more work overall |
Pattern observation: The total work grows roughly with the square of the number of employees if departments are evenly sized.
Time Complexity: O(n * m)
This means the query work grows with the number of employees times the average size of their departments.
[X] Wrong: "The subquery runs just once, so the query is fast regardless of data size."
[OK] Correct: The subquery depends on each employee's department, so it runs repeatedly, increasing total work as data grows.
Understanding how correlated subqueries work helps you explain query performance clearly and shows you can think about how databases handle repeated work.
"What if the subquery was uncorrelated and ran only once? How would the time complexity change?"