GROUP BY single and multiple columns in PostgreSQL - Time & Space Complexity
When using GROUP BY in SQL, the database groups rows based on column values.
We want to understand how the time to group data changes as the data grows.
Analyze the time complexity of these two queries:
-- Group by a single column
SELECT department, COUNT(*)
FROM employees
GROUP BY department;
-- Group by multiple columns
SELECT department, role, COUNT(*)
FROM employees
GROUP BY department, role;
These queries count employees grouped by one or two columns.
Look for repeated work done by the database:
- Primary operation: Scanning all rows in the employees table once.
- How many times: Once for each query, but grouping work depends on number of rows.
The database reads each row and places it into a group based on column values.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 grouping steps |
| 100 | About 100 grouping steps |
| 1000 | About 1000 grouping steps |
Pattern observation: The work grows roughly in direct proportion to the number of rows.
Time Complexity: O(n)
This means the time to group grows linearly as the number of rows increases.
[X] Wrong: "Grouping by more columns always makes the query much slower in a way that grows faster than the number of rows."
[OK] Correct: Grouping time mainly depends on scanning rows once; adding more columns changes grouping keys but does not multiply the number of rows processed.
Understanding how grouping scales helps you explain query performance clearly and shows you know how databases handle data.
What if we added an ORDER BY after GROUP BY? How would that affect the time complexity?