Overview - HAVING for filtering groups

What is it?

HAVING is a command in SQL used to filter groups of rows after they have been grouped together. It works with the GROUP BY clause to allow conditions on aggregated data, like sums or counts. Unlike WHERE, which filters individual rows before grouping, HAVING filters the groups themselves. This helps find groups that meet specific criteria.

Why it matters

Without HAVING, you could only filter rows before grouping, making it impossible to select groups based on summary information like totals or averages. This would limit your ability to analyze data in meaningful ways, such as finding customers with more than five orders or products with total sales above a threshold. HAVING lets you ask questions about groups, not just single rows.

Where it fits

Before learning HAVING, you should understand basic SQL SELECT queries, filtering with WHERE, and grouping data with GROUP BY. After HAVING, you can explore advanced aggregation functions, window functions, and complex reporting queries that combine multiple filters and groupings.

Mental Model

Core Idea

HAVING filters groups created by GROUP BY based on conditions applied to aggregated data.

Think of it like...

Imagine sorting your mail into piles by sender (GROUP BY). HAVING is like deciding to keep only the piles where the total number of letters is more than five. You first group, then decide which groups to keep based on their size.

SELECT columns
  FROM table
  WHERE row_conditions
  GROUP BY grouping_columns
  HAVING group_conditions

Flow:
[Rows] --WHERE--> [Filtered Rows] --GROUP BY--> [Groups] --HAVING--> [Filtered Groups] --SELECT--> [Result]

Build-Up - 7 Steps

1

FoundationUnderstanding GROUP BY basics

Concept: Learn how GROUP BY collects rows into groups based on column values.

GROUP BY takes rows that share the same value in specified columns and bundles them into groups. For example, grouping sales by product ID collects all sales of each product together. Aggregation functions like COUNT or SUM then summarize each group.

Result

Rows are organized into groups, each representing a unique value or combination of values from the grouping columns.

Understanding grouping is essential because HAVING works only on these groups, not on individual rows.

2

FoundationFiltering rows with WHERE clause

3

IntermediateIntroducing HAVING for group filtering

4

IntermediateUsing aggregate functions in HAVING

5

IntermediateDifference between WHERE and HAVING

6

AdvancedCombining HAVING with multiple conditions

7

ExpertPerformance considerations with HAVING

Under the Hood

When a SQL query with GROUP BY and HAVING runs, the database first applies WHERE to filter rows. Then it groups the remaining rows by specified columns. Next, it calculates aggregate functions for each group. Finally, HAVING filters these groups based on aggregate conditions. Only groups passing HAVING are returned. This sequence ensures HAVING works on summaries, not raw rows.

Why designed this way?

SQL was designed to separate row-level filtering (WHERE) from group-level filtering (HAVING) to keep query logic clear and efficient. Early SQL versions lacked HAVING, making group filtering hard. HAVING was added to allow conditions on aggregates without complex subqueries. This design balances expressiveness and performance.

Query Execution Flow:

[Input Rows]
    │
    ▼
[WHERE filters rows]
    │
    ▼
[GROUP BY groups rows]
    │
    ▼
[Aggregate functions compute]
    │
    ▼
[HAVING filters groups]
    │
    ▼
[SELECT final output]

Myth Busters - 4 Common Misconceptions

Quick: Does HAVING filter rows before grouping or groups after grouping? Commit to your answer.

Common Belief:HAVING filters individual rows just like WHERE does.

Tap to reveal reality

Quick: Can you use non-aggregated columns in HAVING without grouping by them? Commit to your answer.

Common Belief:You can use any column in HAVING without restrictions.

Tap to reveal reality

Quick: Is HAVING always slower than WHERE? Commit to your answer.

Common Belief:HAVING is always slower than WHERE and should be avoided.

Tap to reveal reality

Quick: Does HAVING replace WHERE for all filtering needs? Commit to your answer.

Common Belief:HAVING can replace WHERE for all filtering, even on individual rows.

Tap to reveal reality

Expert Zone

1

HAVING can reference aliases defined in SELECT, but only in some SQL dialects like PostgreSQL, which can simplify complex queries.

2

Using HAVING without GROUP BY is allowed in some databases to filter aggregated results over the entire table, acting like a global filter.

3

Combining HAVING with window functions requires careful query structuring because window functions operate after HAVING in execution order.

When NOT to use

Avoid HAVING when filtering individual rows; use WHERE instead for better performance. For complex filtering on aggregates, consider using subqueries or CTEs (WITH clauses) to improve readability and optimization.

Production Patterns

In production, HAVING is often used to find top customers, filter products by sales thresholds, or detect anomalies in grouped data. It is combined with indexes on grouping columns and sometimes with materialized views to speed up repeated queries.

Connections

Aggregation functions

HAVING builds on aggregation functions by filtering groups based on their results.

Understanding aggregation functions deeply helps write precise HAVING conditions that analyze group summaries.

Subqueries

HAVING can sometimes be replaced or complemented by subqueries that filter aggregated data.

Knowing subqueries offers alternative ways to filter groups, useful when HAVING syntax is limited or complex.

Statistics and data summarization

HAVING filters groups based on summary statistics like sums or averages.

Recognizing HAVING as a tool for statistical filtering connects database queries to broader data analysis concepts.

Common Pitfalls

#1Using WHERE to filter aggregated results instead of HAVING.

Wrong approach:SELECT product_id, COUNT(*) FROM sales GROUP BY product_id WHERE COUNT(*) > 5;

Correct approach:SELECT product_id, COUNT(*) FROM sales GROUP BY product_id HAVING COUNT(*) > 5;

Root cause:Confusing WHERE and HAVING roles; WHERE cannot use aggregate functions.

#2Using non-aggregated columns in HAVING without grouping by them.

Wrong approach:SELECT product_id, SUM(sales) FROM sales GROUP BY product_id HAVING sales > 1000;

Correct approach:SELECT product_id, SUM(sales) FROM sales GROUP BY product_id HAVING SUM(sales) > 1000;

Root cause:Misunderstanding that HAVING conditions must use aggregates or grouped columns.

#3Trying to filter groups without GROUP BY using HAVING incorrectly.

Wrong approach:SELECT COUNT(*) FROM sales HAVING COUNT(*) > 10 GROUP BY product_id;

Correct approach:SELECT product_id, COUNT(*) FROM sales GROUP BY product_id HAVING COUNT(*) > 10;

Root cause:Incorrect order of clauses and misunderstanding that HAVING filters groups created by GROUP BY.

Key Takeaways

HAVING filters groups created by GROUP BY based on aggregate conditions, unlike WHERE which filters individual rows.

Use WHERE to reduce rows before grouping and HAVING to filter groups after aggregation for efficient queries.

HAVING conditions usually involve aggregate functions like COUNT, SUM, or AVG to test group summaries.

Misusing WHERE and HAVING leads to syntax errors or incorrect results; understanding their roles is crucial.

Performance improves by combining WHERE and HAVING wisely and using indexes on grouping columns.