OVER clause with PARTITION BY in SQL - Time & Space Complexity
We want to understand how the time to run a query with the OVER clause and PARTITION BY grows as the data gets bigger.
Specifically, how does dividing data into groups affect the work done?
Analyze the time complexity of the following code snippet.
SELECT employee_id, department_id, salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dept_rank
FROM employees;
This query ranks employees by salary within each department.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Sorting employees within each department to assign ranks.
- How many times: Once per department group, sorting all employees in that group.
As the number of employees grows, the query sorts each department's employees separately.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 log 10 operations (sorting small groups) |
| 100 | About 100 log 100 operations (sorting larger groups) |
| 1000 | About 1000 log 1000 operations (sorting even larger groups) |
Pattern observation: The work grows a bit faster than the number of rows because sorting takes more time as groups get bigger.
Time Complexity: O(n log n)
This means the time to run the query grows a bit faster than the number of rows because sorting is needed within each group.
[X] Wrong: "The PARTITION BY makes the query run in linear time because it splits the data."
[OK] Correct: Even though data is split, each group still needs sorting, which takes more than just looking at each row once.
Understanding how window functions like OVER with PARTITION BY affect performance shows you know how databases handle grouped data efficiently.
"What if we removed the PARTITION BY clause and ranked all employees together? How would the time complexity change?"