NULL behavior in aggregate functions in SQL - Time & Space Complexity
When using aggregate functions in SQL, NULL values can affect how many rows are processed.
We want to understand how the presence of NULLs changes the work done by these functions.
Analyze the time complexity of the following SQL query.
SELECT AVG(salary) FROM employees;
This query calculates the average salary, ignoring NULL salaries.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning each row in the employees table once.
- How many times: Once per row, checking if salary is NULL or not.
The query looks at every row to find salaries that are not NULL and sums them.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 checks and sums |
| 100 | 100 checks and sums |
| 1000 | 1000 checks and sums |
Pattern observation: The work grows directly with the number of rows, regardless of how many NULLs there are.
Time Complexity: O(n)
This means the time to compute the average grows linearly with the number of rows in the table.
[X] Wrong: "NULL values cause the query to skip rows and reduce the time needed."
[OK] Correct: The query still checks every row to see if the value is NULL, so the total work depends on all rows, not just non-NULL ones.
Understanding how NULLs affect aggregate functions helps you explain query performance clearly and shows you know how databases handle data internally.
"What if we changed AVG(salary) to COUNT(salary)? How would the time complexity change?"