ROW_NUMBER, RANK, DENSE_RANK in PostgreSQL - Time & Space Complexity
When using ROW_NUMBER, RANK, or DENSE_RANK in SQL, it's important to understand how the time to get results grows as the data grows.
We want to know how the database handles sorting and numbering rows as the table gets bigger.
Analyze the time complexity of the following code snippet.
SELECT employee_id, department_id, salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dense_rank
FROM employees;
This query assigns row numbers and ranks to employees within each department based on salary order.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Sorting rows within each department by salary.
- How many times: Once per department group, but overall it processes all rows.
As the number of employees (rows) grows, the database must sort more data within each department.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 log 10 (sorting 10 rows) |
| 100 | About 100 log 100 (sorting 100 rows) |
| 1000 | About 1000 log 1000 (sorting 1000 rows) |
Pattern observation: The work grows a bit faster than the number of rows because sorting takes more time as data grows.
Time Complexity: O(n log n)
This means the time to assign row numbers and ranks grows a little faster than the number of rows because sorting is involved.
[X] Wrong: "ROW_NUMBER and RANK just scan rows once, so time grows linearly with data size."
[OK] Correct: These functions require sorting rows within groups, which takes more time than a simple scan, so the time grows faster than just the number of rows.
Understanding how window functions like ROW_NUMBER and RANK scale helps you explain query performance clearly and shows you know how databases handle sorting and grouping.
"What if we removed the PARTITION BY clause? How would the time complexity change?"