Why aggregation matters in PostgreSQL - Performance Analysis
When we use aggregation in databases, we combine many rows into summary results. Understanding how long this takes helps us know how well our queries will perform as data grows.
We want to see how the work needed changes when we have more data to aggregate.
Analyze the time complexity of the following code snippet.
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;
This query counts how many employees are in each department by grouping rows with the same department.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning each row in the employees table once.
- How many times: Once for every employee row (n times).
As the number of employees grows, the database must look at each one to count them by department.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 row checks |
| 100 | About 100 row checks |
| 1000 | About 1000 row checks |
Pattern observation: The work grows directly with the number of rows; doubling rows doubles the work.
Time Complexity: O(n)
This means the time to run the aggregation grows in a straight line with the number of rows.
[X] Wrong: "Aggregation runs instantly no matter how much data there is."
[OK] Correct: The database must still look at every row to count or summarize, so more data means more work and more time.
Knowing how aggregation scales helps you explain query performance clearly and shows you understand how databases handle large data sets.
"What if we added an index on the department column? How would the time complexity change?"