0
0
PostgresqlHow-ToBeginner · 3 min read

How to Use DISTINCT ON in PostgreSQL: Syntax and Examples

In PostgreSQL, use DISTINCT ON (column_name) to return the first row for each unique value in the specified column(s). It requires an ORDER BY clause to determine which row to keep for each distinct group.
📐

Syntax

The DISTINCT ON clause selects the first row of each set of rows where the specified columns have the same values. It must be followed by an ORDER BY clause that defines the order of rows within each group.

Parts explained:

  • DISTINCT ON (column1, column2, ...): Columns to find unique groups.
  • ORDER BY column1, column2, ...: Defines which row to pick first in each group.
sql
SELECT DISTINCT ON (column_name) column_name, other_columns
FROM table_name
ORDER BY column_name, other_columns;
💻

Example

This example shows how to select the first order for each customer based on the earliest order date.

sql
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_id INT,
  order_date DATE
);

INSERT INTO orders (customer_id, order_date) VALUES
(1, '2024-01-10'),
(1, '2024-02-15'),
(2, '2024-01-20'),
(2, '2024-01-25'),
(3, '2024-03-01');

SELECT DISTINCT ON (customer_id) customer_id, order_date
FROM orders
ORDER BY customer_id, order_date;
Output
customer_id | order_date -------------+------------ 1 | 2024-01-10 2 | 2024-01-20 3 | 2024-03-01 (3 rows)
⚠️

Common Pitfalls

Common mistakes when using DISTINCT ON include:

  • Not including an ORDER BY clause, which causes an error.
  • Ordering by columns not in DISTINCT ON can lead to unexpected results.
  • Expecting DISTINCT ON to behave like DISTINCT (which removes all duplicates).

Correct usage requires ORDER BY starting with the same columns as DISTINCT ON.

sql
/* Wrong: Missing ORDER BY */
-- SELECT DISTINCT ON (customer_id) customer_id, order_date FROM orders;

/* Wrong: ORDER BY columns do not start with DISTINCT ON columns */
-- SELECT DISTINCT ON (customer_id) customer_id, order_date FROM orders ORDER BY order_date;

/* Right: ORDER BY starts with DISTINCT ON columns */
SELECT DISTINCT ON (customer_id) customer_id, order_date FROM orders ORDER BY customer_id, order_date;
📊

Quick Reference

ClauseDescription
DISTINCT ON (columns)Selects the first row for each unique group of columns
ORDER BY columnsDefines which row to pick first within each group; must start with DISTINCT ON columns
UsageUse to get one row per group with control over which row is chosen
Difference from DISTINCTDISTINCT removes all duplicates; DISTINCT ON keeps first row per group

Key Takeaways

DISTINCT ON returns the first row for each unique value in specified columns.
Always use ORDER BY starting with the same columns as DISTINCT ON to control which row is returned.
DISTINCT ON is PostgreSQL-specific and differs from standard DISTINCT.
Without ORDER BY, DISTINCT ON will cause an error.
Use DISTINCT ON to simplify queries that need one row per group with custom ordering.