GROUP BY with aggregate functions in SQL - Time & Space Complexity
When using GROUP BY with aggregate functions, we want to know how the work grows as the data gets bigger.
How does the database handle grouping and summarizing many rows?
Analyze the time complexity of the following code snippet.
SELECT department, COUNT(*) AS employee_count, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
This query groups employees by their department and calculates the number of employees and average salary per department.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning all rows in the employees table once.
- How many times: Once for each row (n times, where n is total rows).
As the number of rows grows, the database must look at each row to group and calculate aggregates.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 row checks and group updates |
| 100 | About 100 row checks and group updates |
| 1000 | About 1000 row checks and group updates |
Pattern observation: The work grows roughly in direct proportion to the number of rows.
Time Complexity: O(n)
This means the time to run the query grows linearly with the number of rows in the table.
[X] Wrong: "GROUP BY makes the query run slower by multiplying work for each group."
[OK] Correct: The database still scans each row once; grouping just organizes results, not repeating the scan.
Understanding how grouping affects query time helps you explain data summarization clearly and shows you know how databases handle big data efficiently.
"What if we added an ORDER BY after GROUP BY? How would the time complexity change?"