How GROUP BY changes query execution in SQL - Performance & Efficiency
When we use GROUP BY in SQL, the database groups rows before doing calculations.
We want to know how this grouping affects the time it takes to run the query.
Analyze the time complexity of the following code snippet.
SELECT department, COUNT(*)
FROM employees
GROUP BY department;
This query counts how many employees are in each department by grouping rows by department.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning all employee rows and grouping them by department.
- How many times: Each row is visited once, then grouped into a bucket for its department.
As the number of employees grows, the database must process more rows and assign each to a group.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 row checks and group assignments |
| 100 | About 100 row checks and group assignments |
| 1000 | About 1000 row checks and group assignments |
Pattern observation: The work grows roughly in direct proportion to the number of rows.
Time Complexity: O(n)
This means the time to run the query grows roughly in a straight line with the number of rows.
[X] Wrong: "GROUP BY makes the query take much longer than just scanning rows because it does extra work for each group."
[OK] Correct: Grouping adds some work, but the main cost is still reading each row once. The grouping step is efficient and scales linearly with input size.
Understanding how GROUP BY affects query time helps you explain how databases handle data summarization efficiently.
"What if we added an ORDER BY after GROUP BY? How would the time complexity change?"