Subqueries and nested queries in DBMS Theory - Time & Space Complexity
When using subqueries or nested queries in a database, it is important to understand how the time to run the query grows as the data size increases.
We want to know how the number of operations changes when the input tables get bigger.
Analyze the time complexity of the following SQL query with a nested subquery.
SELECT employee_id, name
FROM employees
WHERE department_id IN (
SELECT department_id
FROM departments
WHERE location = 'New York'
);
This query finds employees who work in departments located in New York by using a subquery to filter departments first.
Look for repeated steps in the query execution.
- Primary operation: Scanning the employees table and checking each employee's department against the list from the subquery.
- How many times: The employees table is scanned once; the subquery runs once and its result is used for all checks.
As the number of employees (n) grows, the query must check more rows. The subquery result size depends on departments but is usually smaller.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 employees | About 10 checks against the subquery result |
| 100 employees | About 100 checks against the subquery result |
| 1000 employees | About 1000 checks against the subquery result |
Pattern observation: The work grows roughly in direct proportion to the number of employees.
Time Complexity: O(n)
This means the time to run the query grows linearly with the number of employees.
[X] Wrong: "The subquery runs once and does not affect the total time much."
[OK] Correct: Depending on the database, the subquery might be executed repeatedly or its result scanned many times, affecting performance.
Understanding how nested queries affect performance helps you write efficient database queries and shows you can think about real-world data growth.
What if the subquery was correlated, referencing the outer query? How would the time complexity change?