Why aggregation is needed in SQL - Performance Analysis
We want to understand how the time to run aggregation queries changes as data grows.
How does grouping and summarizing data affect the work the database does?
Analyze the time complexity of the following code snippet.
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;
This query counts how many employees are in each department by grouping rows.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning each row in the employees table once.
- How many times: Once per row, to assign it to a group and update the count.
As the number of employees grows, the database must look at each employee once.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 operations (one per employee) |
| 100 | 100 operations |
| 1000 | 1000 operations |
Pattern observation: The work grows directly with the number of rows.
Time Complexity: O(n)
This means the time to run the aggregation grows in a straight line with the number of rows.
[X] Wrong: "Aggregation queries are always slow because they do extra work."
[OK] Correct: Aggregation just looks at each row once, so it grows linearly, not slower or faster than scanning the data.
Understanding how aggregation scales helps you explain query performance clearly and confidently.
"What if we added an ORDER BY after the GROUP BY? How would the time complexity change?"