Why subqueries are needed in PostgreSQL - Performance Analysis
We want to understand how the time to run a query changes when it uses subqueries.
Specifically, we ask: How does adding a subquery affect the work the database does?
Analyze the time complexity of this SQL query using a subquery.
SELECT employee_id, name
FROM employees
WHERE department_id IN (
SELECT department_id
FROM departments
WHERE location = 'New York'
);
This query finds employees who work in departments located in New York by using a subquery to get those department IDs first.
Look for repeated steps in the query execution.
- Primary operation: The subquery runs first to find matching department IDs.
- How many times: The subquery runs once, then the main query checks each employee against the subquery result.
As the number of departments and employees grows, the work changes like this:
| Input Size (n) | Approx. Operations |
|---|---|
| 10 departments, 100 employees | Subquery: 10 checks, Main query: 100 checks |
| 100 departments, 1,000 employees | Subquery: 100 checks, Main query: 1,000 checks |
| 1,000 departments, 10,000 employees | Subquery: 1,000 checks, Main query: 10,000 checks |
Pattern observation: The subquery work grows with departments, and the main query work grows with employees.
Time Complexity: O(m + n)
This means the total work grows roughly by adding the size of the subquery input (departments) and the main query input (employees).
[X] Wrong: "The subquery runs once for each employee, so time grows like m x n."
[OK] Correct: In this case, the subquery runs only once, and its results are used to filter employees, so the work adds up, not multiplies.
Understanding how subqueries affect query time helps you write efficient database queries and explain your choices clearly in real projects or interviews.
"What if the subquery was correlated to the main query? How would the time complexity change?"