PARTITION BY for grouping windows in PostgreSQL - Time & Space Complexity
We want to understand how the time needed to run a query with PARTITION BY grows as the data gets bigger.
Specifically, how does grouping rows into partitions affect the work the database does?
Analyze the time complexity of the following code snippet.
SELECT employee_id, department_id, salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS salary_rank
FROM employees;
This query ranks employees by salary within each department using a window function with PARTITION BY.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: The database groups rows by department_id, then sorts each group by salary.
- How many times: It does this for each partition (each department), processing all rows once.
As the number of employees grows, the database must sort more rows within each department.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 log 10 operations (sorting small groups) |
| 100 | About 100 log 100 operations (sorting larger groups) |
| 1000 | About 1000 log 1000 operations (sorting even larger groups) |
Pattern observation: The work grows a bit faster than the number of rows because sorting inside each partition takes more time as data grows.
Time Complexity: O(n log n)
This means the time to run the query grows a little faster than the number of rows because sorting happens within each group.
[X] Wrong: "PARTITION BY just groups rows and doesn't add any extra work."
[OK] Correct: Grouping means the database must sort or organize rows inside each group, which takes extra time especially as data grows.
Understanding how PARTITION BY affects query time helps you explain how databases handle grouped data efficiently, a useful skill for real projects and interviews.
"What if we removed ORDER BY inside the window function? How would the time complexity change?"