Aggregate with NULL handling in SQL - Time & Space Complexity
We want to understand how the time to calculate aggregates changes as the data grows.
Specifically, how handling NULL values affects the work done.
Analyze the time complexity of the following SQL query.
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) IS NOT NULL;
This query calculates the average salary per department, ignoring NULL salaries.
Look for repeated work done as data grows.
- Primary operation: Scanning each employee row once to compute averages.
- How many times: Once per row in the employees table.
As the number of employees grows, the work to compute averages grows linearly.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 scans and calculations |
| 100 | 100 scans and calculations |
| 1000 | 1000 scans and calculations |
Pattern observation: Doubling the rows roughly doubles the work.
Time Complexity: O(n)
This means the time grows directly with the number of rows processed.
[X] Wrong: "Handling NULL values makes the query slower by a lot because it adds extra loops."
[OK] Correct: The NULL check happens during the single scan of each row, so it does not add extra passes over the data.
Understanding how aggregate queries scale helps you explain performance in real projects.
"What if we added a WHERE clause filtering rows before aggregation? How would that affect time complexity?"