Subquery in FROM clause (derived table) in SQL - Time & Space Complexity
When we use a subquery inside the FROM clause, it acts like a temporary table. We want to understand how the time to run this query grows as the data gets bigger.
How does the size of the original table affect the total work done?
Analyze the time complexity of the following code snippet.
SELECT dt.customer_id, dt.total_orders
FROM (
SELECT customer_id, COUNT(*) AS total_orders
FROM orders
GROUP BY customer_id
) AS dt
WHERE dt.total_orders > 5;
This query counts orders per customer in a subquery, then filters customers with more than 5 orders.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning all rows in the orders table once to count orders per customer.
- How many times: Once over all orders (n rows), then grouping by customers.
As the number of orders grows, the database must look at each order once to count them.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 scans and counts |
| 100 | About 100 scans and counts |
| 1000 | About 1000 scans and counts |
Pattern observation: The work grows roughly in direct proportion to the number of orders.
Time Complexity: O(n)
This means the time to run the query grows linearly with the number of orders in the table.
[X] Wrong: "The subquery runs multiple times, so the time grows faster than the table size."
[OK] Correct: The subquery runs once to create the derived table, so the main cost is scanning the orders table once, not repeatedly.
Understanding how subqueries in the FROM clause affect performance helps you write efficient queries and explain your reasoning clearly in real-world situations.
"What if we added an index on customer_id in the orders table? How would the time complexity change?"