0
0
SQLquery~5 mins

Top-N per group query in SQL - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Top-N per group query
O(n log n)
Understanding Time Complexity

When we ask for the top N items in each group from a database, we want to know how the work grows as the data grows.

How does the time to get these top items change when there are more groups or more rows?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.


SELECT department, employee, salary
FROM (
  SELECT department, employee, salary,
         ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
  FROM employees
) sub
WHERE rn <= 3;
    

This query finds the top 3 highest salaries in each department from the employees table.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: The window function scans all rows and assigns a rank within each department group.
  • How many times: It processes every row once, but internally it sorts each group to assign ranks.
How Execution Grows With Input

Explain the growth pattern intuitively.

Input Size (n)Approx. Operations
10About 10 rows processed, sorting small groups quickly
100About 100 rows processed, sorting groups takes more time
1000About 1000 rows processed, sorting each group grows with group size

Pattern observation: The work grows roughly in proportion to the total number of rows, with extra cost for sorting within each group.

Final Time Complexity

Time Complexity: O(n log n)

This means the time grows a bit faster than the number of rows because sorting happens inside each group.

Common Mistake

[X] Wrong: "The query only looks at the top 3 rows per group, so it runs in constant time per group."

[OK] Correct: The database must first sort all rows in each group to find the top 3, so it processes all rows, not just 3.

Interview Connect

Understanding how grouping and sorting affect query time helps you explain your approach clearly and shows you know how databases handle data efficiently.

Self-Check

"What if we changed ROW_NUMBER() to RANK() or DENSE_RANK()? How would the time complexity change?"