NTILE for distribution in SQL - Time & Space Complexity
We want to understand how the time to run an NTILE query changes as the data grows.
Specifically, how does dividing rows into groups affect performance?
Analyze the time complexity of the following SQL query using NTILE.
SELECT employee_id, salary,
NTILE(4) OVER (ORDER BY salary DESC) AS salary_quartile
FROM employees;
This query divides employees into 4 groups based on salary order.
Look for repeated work done as data grows.
- Primary operation: Sorting all rows by salary.
- How many times: Once for the entire dataset.
- Secondary operation: Assigning each row to a group (NTILE) after sorting.
- How many times: Once per row.
As the number of rows grows, sorting takes more time, and assigning groups grows linearly.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 log 10 + 10 steps |
| 100 | About 100 log 100 + 100 steps |
| 1000 | About 1000 log 1000 + 1000 steps |
Pattern observation: Sorting grows faster than just going through rows, so it dominates.
Time Complexity: O(n log n)
This means the time grows a bit faster than the number of rows because sorting is needed before grouping.
[X] Wrong: "NTILE just splits rows quickly, so it runs in linear time."
[OK] Correct: Before NTILE can assign groups, the data must be sorted, which takes more time than just going through rows once.
Understanding how window functions like NTILE scale helps you explain query performance clearly and shows you know what happens behind the scenes.
What if we removed the ORDER BY clause inside NTILE? How would the time complexity change?