0
0
PostgreSQLquery~15 mins

DISTINCT ON for unique per group in PostgreSQL - Deep Dive

Choose your learning style9 modes available
Overview - DISTINCT ON for unique per group
What is it?
DISTINCT ON is a special feature in PostgreSQL that helps you pick one unique row from each group of rows based on certain columns. It lets you choose the first row for each group according to an order you define. This is useful when you want to find a single representative row per group without writing complex queries.
Why it matters
Without DISTINCT ON, selecting one unique row per group often requires complicated subqueries or window functions, which can be hard to write and understand. DISTINCT ON simplifies this task, making queries easier to read and faster to write. This helps when working with grouped data like latest orders per customer or top scores per player.
Where it fits
Before learning DISTINCT ON, you should understand basic SQL SELECT queries, GROUP BY, and ORDER BY clauses. After mastering DISTINCT ON, you can explore window functions and advanced grouping techniques for more complex data analysis.
Mental Model
Core Idea
DISTINCT ON picks the first row from each group of rows defined by certain columns, based on a specified order.
Think of it like...
Imagine you have several stacks of books sorted by genre. DISTINCT ON is like picking the top book from each genre stack after arranging them by publication date.
┌───────────────┐
│ All rows      │
│ (unsorted)   │
└──────┬────────┘
       │ GROUP BY columns
       ▼
┌───────────────┐
│ Groups formed │
└──────┬────────┘
       │ ORDER BY within groups
       ▼
┌─────────────────────────────┐
│ DISTINCT ON picks first row  │
│ per group based on order     │
└─────────────────────────────┘
Build-Up - 7 Steps
1
FoundationBasic SELECT and GROUP BY
🤔
Concept: Understanding how to group rows by columns and select data.
In SQL, GROUP BY lets you group rows that share the same values in specified columns. For example, grouping sales by customer shows total sales per customer. However, GROUP BY usually aggregates data and does not return individual rows.
Result
You get one row per group with aggregated values like sums or counts.
Knowing how GROUP BY works is essential because DISTINCT ON also groups rows but returns one full row per group instead of aggregates.
2
FoundationORDER BY basics in SQL
🤔
Concept: Sorting query results by one or more columns.
ORDER BY arranges rows in ascending or descending order. For example, ordering sales by date shows the newest sales first or last. ORDER BY affects which rows appear first in the result.
Result
Rows are sorted as specified, which is important for picking the first row in each group.
Understanding ORDER BY is key because DISTINCT ON relies on it to decide which row to keep per group.
3
IntermediateDISTINCT ON syntax and usage
🤔Before reading on: do you think DISTINCT ON returns all unique rows or just one per group? Commit to your answer.
Concept: DISTINCT ON selects the first row of each group defined by specified columns, based on ORDER BY.
The syntax is: SELECT DISTINCT ON (column1, column2) * FROM table ORDER BY column1, column2, other_columns. It returns one row per unique combination of column1 and column2, choosing the first row according to ORDER BY.
Result
You get one row per group, the one that appears first in the ORDER BY sorting.
Knowing DISTINCT ON returns only the first row per group helps avoid confusion with regular DISTINCT, which removes duplicate rows entirely.
4
IntermediateCombining DISTINCT ON with ORDER BY
🤔Before reading on: does changing ORDER BY columns affect which row DISTINCT ON picks? Commit to your answer.
Concept: ORDER BY controls which row is chosen as the first in each group for DISTINCT ON.
If you want the latest entry per group, order by group columns then date descending. DISTINCT ON picks the first row per group after sorting, so changing ORDER BY changes the chosen row.
Result
The output changes depending on ORDER BY, allowing flexible selection of representative rows.
Understanding the link between ORDER BY and DISTINCT ON lets you control which row per group you get.
5
IntermediateDISTINCT ON vs GROUP BY vs window functions
🤔Before reading on: do you think DISTINCT ON can replace all GROUP BY and window function use cases? Commit to your answer.
Concept: DISTINCT ON is simpler for picking one row per group but has limitations compared to GROUP BY and window functions.
GROUP BY aggregates data, window functions can rank or number rows per group, and DISTINCT ON picks the first row per group. DISTINCT ON is easier but less flexible than window functions.
Result
You learn when DISTINCT ON is the best tool and when to use other SQL features.
Knowing the strengths and limits of DISTINCT ON helps choose the right approach for your query.
6
AdvancedPerformance considerations with DISTINCT ON
🤔Before reading on: do you think DISTINCT ON is always faster than window functions? Commit to your answer.
Concept: DISTINCT ON can be efficient but depends on indexes and query structure.
Using DISTINCT ON with proper indexes on the grouping and ordering columns can speed up queries. However, complex ORDER BY or large datasets may reduce performance. Window functions sometimes perform better for complex needs.
Result
You understand how to write performant queries using DISTINCT ON.
Knowing how indexes and query plans affect DISTINCT ON helps optimize real-world queries.
7
ExpertUnexpected behavior and edge cases
🤔Before reading on: do you think DISTINCT ON always returns deterministic results without ORDER BY? Commit to your answer.
Concept: DISTINCT ON requires careful ORDER BY to avoid unpredictable results.
If ORDER BY does not fully specify row order within groups, DISTINCT ON may return any row from the group, leading to inconsistent results. Also, mixing DISTINCT ON with joins can cause surprises if not carefully planned.
Result
You learn to write reliable queries and avoid subtle bugs.
Understanding the importance of ORDER BY completeness prevents hard-to-find errors in production.
Under the Hood
PostgreSQL processes DISTINCT ON by scanning rows sorted according to ORDER BY. It keeps track of the distinct values of the specified columns and returns the first row it encounters for each unique group. Internally, it uses a state to remember which groups have been seen and skips subsequent rows with the same group keys.
Why designed this way?
DISTINCT ON was designed to provide a simple, readable way to select one row per group without complex window functions or subqueries. It trades some flexibility for simplicity and performance in common use cases. Other databases lack this feature, so PostgreSQL offers it as a unique convenience.
┌───────────────┐
│ Input rows    │
└──────┬────────┘
       │ ORDER BY sorts rows
       ▼
┌───────────────┐
│ Sorted rows   │
└──────┬────────┘
       │ Scan rows in order
       ▼
┌─────────────────────────────┐
│ For each row:               │
│ If group key not seen yet,  │
│   output row and mark group │
│ Else skip row               │
└─────────────────────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does DISTINCT ON return all unique rows or just one per group? Commit to your answer.
Common Belief:DISTINCT ON returns all unique rows like DISTINCT does.
Tap to reveal reality
Reality:DISTINCT ON returns only the first row per group defined by the columns, not all unique rows.
Why it matters:Misunderstanding this leads to expecting more rows than returned, causing confusion and wrong query results.
Quick: Can you use DISTINCT ON without ORDER BY and get consistent results? Commit to your answer.
Common Belief:DISTINCT ON works fine without ORDER BY and always returns the same rows.
Tap to reveal reality
Reality:Without ORDER BY, DISTINCT ON returns unpredictable rows per group because row order is not guaranteed.
Why it matters:This can cause inconsistent outputs and bugs that are hard to reproduce.
Quick: Is DISTINCT ON supported in all SQL databases? Commit to your answer.
Common Belief:DISTINCT ON is a standard SQL feature available everywhere.
Tap to reveal reality
Reality:DISTINCT ON is specific to PostgreSQL and not supported in most other SQL databases.
Why it matters:Relying on DISTINCT ON reduces portability of SQL code across different database systems.
Quick: Does DISTINCT ON always perform better than window functions? Commit to your answer.
Common Belief:DISTINCT ON is always faster than window functions for unique per group queries.
Tap to reveal reality
Reality:Performance depends on data size, indexes, and query complexity; sometimes window functions are faster.
Why it matters:Assuming DISTINCT ON is always best can lead to inefficient queries in some cases.
Expert Zone
1
DISTINCT ON requires that the ORDER BY clause starts with the same columns as DISTINCT ON to work correctly; otherwise, results can be unpredictable.
2
When using DISTINCT ON with joins, the order of rows can be affected by join conditions, so careful ordering and filtering are needed to get correct unique rows.
3
DISTINCT ON can be combined with LIMIT to efficiently fetch a small number of unique groups, but the interaction with LIMIT and ORDER BY must be well understood to avoid surprises.
When NOT to use
Avoid DISTINCT ON when you need multiple rows per group or complex ranking beyond the first row. Use window functions like ROW_NUMBER() or RANK() for more flexible row selection. Also, if you need portable SQL across databases, prefer standard SQL techniques.
Production Patterns
In real systems, DISTINCT ON is often used to get the latest record per user, the first event per session, or the top score per category. It is combined with indexes on grouping and ordering columns for performance. Developers also use it in reporting queries where a simple unique per group row is needed quickly.
Connections
Window Functions
DISTINCT ON and window functions both select rows per group but use different methods.
Understanding DISTINCT ON helps grasp window functions like ROW_NUMBER(), which generalize unique per group selection with more flexibility.
Data Deduplication
DISTINCT ON is a form of deduplication by choosing one row per group.
Knowing how DISTINCT ON works clarifies how deduplication can be done efficiently in databases.
Sorting Algorithms
DISTINCT ON relies on sorting rows before picking unique groups.
Understanding sorting's role in DISTINCT ON deepens appreciation of how data order affects query results.
Common Pitfalls
#1Missing ORDER BY leads to unpredictable row selection.
Wrong approach:SELECT DISTINCT ON (customer_id) * FROM orders;
Correct approach:SELECT DISTINCT ON (customer_id) * FROM orders ORDER BY customer_id, order_date DESC;
Root cause:Without ORDER BY, PostgreSQL does not know which row to pick first per group, causing random results.
#2ORDER BY columns do not start with DISTINCT ON columns.
Wrong approach:SELECT DISTINCT ON (customer_id) * FROM orders ORDER BY order_date DESC;
Correct approach:SELECT DISTINCT ON (customer_id) * FROM orders ORDER BY customer_id, order_date DESC;
Root cause:ORDER BY must begin with the DISTINCT ON columns to correctly group rows before ordering.
#3Expecting DISTINCT ON to work in other SQL databases.
Wrong approach:SELECT DISTINCT ON (customer_id) * FROM orders; -- in MySQL or SQL Server
Correct approach:Use window functions or subqueries in databases without DISTINCT ON.
Root cause:DISTINCT ON is a PostgreSQL-specific feature and not part of standard SQL.
Key Takeaways
DISTINCT ON in PostgreSQL selects the first row per group defined by specified columns, based on ORDER BY sorting.
ORDER BY must start with the DISTINCT ON columns to ensure predictable and correct results.
DISTINCT ON simplifies queries that need one unique row per group, avoiding complex window functions or subqueries.
Without proper ORDER BY, DISTINCT ON can return unpredictable rows, causing bugs.
DISTINCT ON is PostgreSQL-specific and may not work in other SQL databases, so know when to use alternatives.