SELECT with WHERE, ORDER BY, GROUP BY in DBMS Theory - Time & Space Complexity
When we use SELECT queries with WHERE, ORDER BY, and GROUP BY, the database does several steps to get the result.
We want to understand how the time it takes grows as the data gets bigger.
Analyze the time complexity of the following code snippet.
SELECT department, COUNT(*)
FROM employees
WHERE salary > 50000
GROUP BY department
ORDER BY COUNT(*) DESC;
This query filters employees by salary, groups them by department, counts each group, and sorts the groups by count.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning all employee rows to check the salary condition.
- How many times: Once for each row in the employees table.
- Secondary operations: Grouping rows by department and counting them, then sorting the groups.
As the number of employees grows, the database must check each one for the salary condition.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 checks and grouping steps |
| 100 | About 100 checks and grouping steps |
| 1000 | About 1000 checks and grouping steps |
Pattern observation: The work grows roughly in direct proportion to the number of rows.
Time Complexity: O(n + k log k)
This means the time grows linearly with the number of rows scanned (n), plus the time to sort the grouped results (k groups), which is typically less than n.
[X] Wrong: "The query only takes time proportional to the number of rows, so it is O(n)."
[OK] Correct: Sorting the grouped results adds extra work that grows faster than just scanning rows, depending on the number of groups.
Understanding how filtering, grouping, and sorting affect query time helps you explain database performance clearly and confidently.
"What if we removed the ORDER BY clause? How would the time complexity change?"