0
0
SQLquery~15 mins

NULL in DISTINCT, GROUP BY, and ORDER BY in SQL - Deep Dive

Choose your learning style9 modes available
Overview - NULL in DISTINCT, GROUP BY, and ORDER BY
What is it?
In SQL, NULL represents missing or unknown data. When using DISTINCT, GROUP BY, or ORDER BY clauses, NULL values behave in special ways that affect how rows are grouped, filtered, or sorted. Understanding how NULL interacts with these clauses helps you write accurate queries and interpret results correctly.
Why it matters
Without knowing how NULL works in these clauses, you might get unexpected results like missing groups, incorrect counts, or surprising orderings. This can lead to wrong reports, bad decisions, or bugs in applications that rely on database queries.
Where it fits
Before learning this, you should understand basic SQL SELECT queries and the purpose of DISTINCT, GROUP BY, and ORDER BY. After this, you can explore advanced SQL topics like NULL handling functions, filtering NULLs, and query optimization.
Mental Model
Core Idea
In SQL, NULL is treated as an unknown value that groups together with other NULLs in DISTINCT and GROUP BY, but sorts differently in ORDER BY depending on the database.
Think of it like...
Think of NULL as a mystery box in a collection: when grouping or removing duplicates, all mystery boxes are treated as the same unknown item, but when sorting, some people put all mystery boxes at the start of the line, others at the end.
┌───────────────┐
│   Data Rows   │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ DISTINCT/GROUP│
│   BY Clause   │
│  NULLs group  │
│  together as  │
│   one group   │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ ORDER BY Clause│
│ NULLs sort at │
│ start or end  │
│ depending on  │
│   DB system   │
└───────────────┘
Build-Up - 7 Steps
1
FoundationWhat is NULL in SQL
🤔
Concept: Introduce NULL as a special marker for missing or unknown data in SQL.
In SQL, NULL means a value is missing or unknown. It is not zero, empty, or a blank string. NULL is a placeholder that says 'we don't know this value'. For example, a person's middle name might be NULL if they don't have one or it is not recorded.
Result
You understand that NULL is different from other values and represents unknown or missing information.
Understanding NULL as unknown data is key because it behaves differently from normal values in queries.
2
FoundationBasics of DISTINCT, GROUP BY, ORDER BY
🤔
Concept: Explain what DISTINCT, GROUP BY, and ORDER BY do in simple terms.
DISTINCT removes duplicate rows from results. GROUP BY collects rows with the same values into groups. ORDER BY sorts rows by specified columns. These clauses help organize and summarize data in queries.
Result
You know the purpose of these clauses and how they change query output.
Knowing these clauses sets the stage to understand how NULL affects their behavior.
3
IntermediateHow NULL behaves in DISTINCT and GROUP BY
🤔Before reading on: do you think NULL values are treated as equal or different when grouping or removing duplicates? Commit to your answer.
Concept: Explain that NULLs are treated as equal in DISTINCT and GROUP BY, forming one group.
When you use DISTINCT or GROUP BY, all rows with NULL in the same column are grouped together as if NULL equals NULL. This means NULLs do not create multiple groups or duplicates; they form a single group representing unknown values.
Result
Queries with NULLs in grouped columns show one group for all NULLs, not multiple separate groups.
Understanding that NULLs group together prevents confusion about missing groups or unexpected duplicates.
4
IntermediateHow NULL sorts in ORDER BY
🤔Before reading on: do you think NULL values always appear at the start, end, or mixed in when sorting? Commit to your answer.
Concept: Explain that NULL sorting depends on the database system and can appear at the start or end of sorted results.
In ORDER BY, NULLs are treated as unknown and their position depends on the database. Some put NULLs first (like PostgreSQL by default), others put NULLs last (like SQL Server). You can often control this with NULLS FIRST or NULLS LAST clauses.
Result
Sorted query results show NULLs either at the beginning or end, affecting how data is viewed.
Knowing NULL sorting behavior helps you predict and control query output order.
5
IntermediateUsing NULLS FIRST and NULLS LAST
🤔
Concept: Introduce explicit control of NULL sorting order in ORDER BY.
Many SQL databases let you specify NULLS FIRST or NULLS LAST in ORDER BY to control where NULLs appear. For example, ORDER BY column ASC NULLS LAST puts NULLs after all values. This helps make sorting predictable and clear.
Result
You can customize sorting to place NULLs exactly where you want in results.
Explicit NULL sorting improves query clarity and user experience.
6
AdvancedNULL impact on aggregate functions with GROUP BY
🤔Before reading on: do you think aggregate functions like COUNT or SUM include NULL values by default? Commit to your answer.
Concept: Explain how NULLs affect aggregate functions when used with GROUP BY.
Aggregate functions handle NULLs differently: COUNT(column) ignores NULLs, COUNT(*) counts all rows, SUM and AVG ignore NULLs. When grouping, NULLs form one group, but aggregates may skip NULL values inside that group, affecting results.
Result
Aggregate results may be lower or different because NULLs are excluded from calculations.
Knowing how NULLs affect aggregates prevents misinterpretation of summary data.
7
ExpertSurprising NULL behavior in DISTINCT and GROUP BY optimizations
🤔Before reading on: do you think NULLs can cause performance or correctness issues in DISTINCT or GROUP BY? Commit to your answer.
Concept: Reveal how some database engines optimize DISTINCT and GROUP BY with NULLs, sometimes causing subtle bugs or performance differences.
Some databases optimize DISTINCT and GROUP BY by treating NULLs specially, like hashing NULLs as equal. This can cause unexpected results if NULLs represent different unknowns or if indexes behave differently. Also, NULLs can affect query plans and performance.
Result
Experienced users see that NULL handling can impact query speed and correctness in edge cases.
Understanding internal NULL handling helps write robust queries and troubleshoot subtle bugs.
Under the Hood
Internally, SQL engines treat NULL as a special marker, not a value. For DISTINCT and GROUP BY, the engine groups all NULLs together by hashing or comparing them as equal for grouping purposes. For ORDER BY, NULLs are sorted based on database-specific rules or explicit NULLS FIRST/LAST directives. Aggregate functions skip NULLs by design to avoid counting unknowns.
Why designed this way?
SQL was designed to handle unknown data gracefully. Grouping NULLs together simplifies query logic and matches the idea that unknowns can't be distinguished. Sorting NULLs at start or end is a practical choice, with options added later for flexibility. Ignoring NULLs in aggregates avoids misleading results from unknown values.
┌───────────────┐
│   Input Rows  │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│  NULL Handling│
│  Logic Layer  │
│  - NULLs equal│
│  - NULLs sort │
│  - NULLs skip │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Query Results │
│  DISTINCT,    │
│  GROUP BY,    │
│  ORDER BY     │
└───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do NULLs count as duplicates in DISTINCT? Commit to yes or no.
Common Belief:NULLs are treated as different values, so multiple NULLs appear as duplicates in DISTINCT.
Tap to reveal reality
Reality:All NULLs are treated as equal in DISTINCT, so only one NULL appears in the result.
Why it matters:Believing NULLs are different can cause confusion when expecting multiple NULL rows but seeing only one.
Quick: Does GROUP BY treat NULLs as separate groups? Commit to yes or no.
Common Belief:Each NULL is a unique group because NULL means unknown and can't be equal.
Tap to reveal reality
Reality:GROUP BY treats all NULLs as one group, grouping them together.
Why it matters:Misunderstanding this leads to expecting more groups and wrong aggregation results.
Quick: Are NULLs always sorted at the end in ORDER BY? Commit to yes or no.
Common Belief:NULLs always appear last when sorting ascending.
Tap to reveal reality
Reality:Sorting NULLs depends on the database; some put NULLs first by default, others last.
Why it matters:Assuming NULLs always sort last can cause unexpected order in query results.
Quick: Do aggregate functions like COUNT include NULLs? Commit to yes or no.
Common Belief:COUNT(column) counts NULL values as well.
Tap to reveal reality
Reality:COUNT(column) ignores NULLs; only COUNT(*) counts all rows including NULLs.
Why it matters:Counting NULLs incorrectly leads to wrong totals and misinformed decisions.
Expert Zone
1
Some databases treat NULLs differently in DISTINCT and GROUP BY when using indexes, affecting performance and results subtly.
2
The SQL standard treats NULLs as equal in grouping, but some legacy systems or modes may differ, causing portability issues.
3
NULL sorting behavior can be influenced by collation and locale settings, adding complexity to ORDER BY results.
When NOT to use
Avoid relying on NULL grouping behavior when NULLs represent different unknowns that must be distinguished; instead, use explicit markers or filters. For sorting, if precise NULL placement is critical and your database lacks NULLS FIRST/LAST, consider using CASE expressions to control order.
Production Patterns
In production, developers often filter out NULLs before grouping to avoid ambiguous groups, or use COALESCE to replace NULLs with default values for clearer grouping and sorting. Explicit NULLS FIRST/LAST clauses are used to ensure consistent ordering across environments.
Connections
Three-valued logic (3VL)
Builds-on
Understanding NULL in DISTINCT, GROUP BY, and ORDER BY is easier when you know SQL's three-valued logic, where NULL means unknown and affects comparisons and conditions.
Data cleaning and preprocessing
Builds-on
Knowing how NULL behaves in grouping and sorting helps in data cleaning, where missing values must be handled carefully to avoid misleading summaries or orderings.
Sorting algorithms in computer science
Same pattern
The way SQL sorts NULLs relates to how sorting algorithms handle special or sentinel values, showing a shared pattern of ordering unknown or special cases.
Common Pitfalls
#1Expecting multiple NULL groups in GROUP BY
Wrong approach:SELECT column FROM table GROUP BY column; -- expecting separate groups for each NULL row
Correct approach:SELECT column, COUNT(*) FROM table GROUP BY column; -- NULLs form one group counted together
Root cause:Misunderstanding that NULLs are treated as equal in grouping, not as distinct unknowns.
#2Assuming NULLs always sort last in ORDER BY
Wrong approach:SELECT * FROM table ORDER BY column ASC; -- expecting NULLs at the end but they appear first
Correct approach:SELECT * FROM table ORDER BY column ASC NULLS LAST; -- explicitly placing NULLs at the end
Root cause:Not knowing that NULL sorting defaults vary by database and need explicit control.
#3Counting NULLs with COUNT(column)
Wrong approach:SELECT COUNT(column) FROM table; -- expecting count includes NULLs
Correct approach:SELECT COUNT(*) FROM table; -- counts all rows including NULLs
Root cause:Confusing COUNT(column) which ignores NULLs with COUNT(*) which counts all rows.
Key Takeaways
NULL in SQL means unknown or missing data and behaves specially in queries.
In DISTINCT and GROUP BY, all NULLs are treated as equal and form one group or one distinct value.
ORDER BY sorts NULLs differently depending on the database, often placing them at the start or end, controllable with NULLS FIRST or NULLS LAST.
Aggregate functions like COUNT(column) ignore NULLs, while COUNT(*) counts all rows regardless of NULLs.
Understanding these behaviors prevents common mistakes and helps write accurate, predictable SQL queries.