Window frame (ROWS BETWEEN, RANGE BETWEEN) in PostgreSQL - Time & Space Complexity
When using window frames like ROWS BETWEEN or RANGE BETWEEN in SQL, it is important to understand how the query's work grows as the data size increases.
We want to know how the number of rows processed affects the time it takes to compute the window functions.
Analyze the time complexity of the following PostgreSQL query using a window frame.
SELECT employee_id, department_id, salary,
AVG(salary) OVER (
PARTITION BY department_id
ORDER BY employee_id
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS avg_salary
FROM employees;
This query calculates the average salary for each employee considering the current row and the two previous rows within the same department.
Look for repeated work done for each row in the window frame.
- Primary operation: Calculating the average over a sliding window of up to 3 rows per employee.
- How many times: For each row, the database processes up to 3 rows to compute the average.
As the number of rows grows, the work per row depends on the window frame size.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 x 3 = 30 operations |
| 100 | About 100 x 3 = 300 operations |
| 1000 | About 1000 x 3 = 3000 operations |
Pattern observation: The total work grows roughly in a straight line with the number of rows, multiplied by the fixed window size.
Time Complexity: O(n)
This means the time to compute the window function grows linearly with the number of rows, assuming the window frame size is fixed.
[X] Wrong: "The window frame always causes the query to do work on all rows for each row, so it's quadratic."
[OK] Correct: When the window frame size is fixed (like 2 preceding rows), the database only looks at a small, constant number of rows per row, so the work grows linearly, not quadratically.
Understanding how window frames affect query performance helps you write efficient SQL and explain your reasoning clearly in interviews.
What if we changed the window frame from ROWS BETWEEN 2 PRECEDING AND CURRENT ROW to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW? How would the time complexity change?