Subquery in WHERE clause in SQL - Time & Space Complexity
We want to understand how the time it takes to run a SQL query with a subquery in the WHERE clause changes as the data grows.
Specifically, we ask: How does the size of the main table and the subquery table affect the total work done?
Analyze the time complexity of the following code snippet.
SELECT employee_id, name
FROM employees
WHERE department_id IN (
SELECT department_id
FROM departments
WHERE location = 'New York'
);
This query finds employees who work in departments located in New York by using a subquery inside the WHERE clause.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Checking each employee against the list of department IDs returned by the subquery.
- How many times: For each employee row, the database checks if their department_id is in the subquery result.
As the number of employees grows, the database must check more rows. Also, the subquery result size depends on how many departments are in New York.
| Input Size (m) | Approx. Operations |
|---|---|
| 10 employees | Checks 10 employees against subquery results (few departments) |
| 100 employees | Checks 100 employees against subquery results (more departments) |
| 1000 employees | Checks 1000 employees against subquery results (even more departments) |
Pattern observation: The work grows roughly in proportion to the number of employees times the size of the subquery result.
Time Complexity: O(m * n)
This means the time grows roughly by multiplying the number of employees (m) by the number of departments in New York (n).
[X] Wrong: "The subquery runs only once, so the query time depends only on the number of employees."
[OK] Correct: The subquery result size affects how many checks happen for each employee, so both sizes matter.
Understanding how subqueries affect performance helps you write efficient queries and explain your reasoning clearly in interviews.
"What if we replaced the IN subquery with a JOIN? How would the time complexity change?"