WHERE vs HAVING mental model in SQL - Performance Comparison
We want to understand how filtering data at different stages affects the work a database does.
Specifically, how using WHERE and HAVING changes the amount of data processed.
Analyze the time complexity of these two queries.
-- Query 1: Filtering rows before grouping
SELECT department, COUNT(*)
FROM employees
WHERE salary > 50000
GROUP BY department;
-- Query 2: Filtering groups after grouping
SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;
The first query filters rows before grouping; the second filters groups after grouping.
Look at what repeats as data grows.
- Primary operation: Scanning all employee rows.
- How many times: Once for each row in the table.
- Grouping operation processes all filtered rows or all rows depending on WHERE or HAVING.
Imagine the table grows from 10 to 1000 employees.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | Scan 10 rows, group filtered rows |
| 100 | Scan 100 rows, group filtered rows |
| 1000 | Scan 1000 rows, group filtered rows |
Filtering early with WHERE reduces rows before grouping, so less grouping work.
Filtering late with HAVING means grouping all rows first, then filtering groups.
Time Complexity: O(n)
This means the work grows linearly with the number of rows, but filtering early can reduce the constant work inside.
[X] Wrong: "WHERE and HAVING filters do the same amount of work."
[OK] Correct: WHERE filters rows before grouping, reducing data early. HAVING filters after grouping, so grouping happens on all rows first.
Understanding when filtering happens helps you explain query efficiency clearly and shows you know how databases handle data step-by-step.
What if we added a WHERE condition and a HAVING condition together? How would that affect the time complexity?