Given the table employees with columns department, employee_id, and salary, what will this query return?
SELECT DISTINCT ON (department) department, employee_id, salary FROM employees ORDER BY department, salary DESC;
CREATE TABLE employees (department TEXT, employee_id INT, salary INT); INSERT INTO employees VALUES ('Sales', 1, 5000), ('Sales', 2, 7000), ('HR', 3, 6000), ('HR', 4, 5500), ('IT', 5, 8000); SELECT DISTINCT ON (department) department, employee_id, salary FROM employees ORDER BY department, salary DESC;
Remember that DISTINCT ON (column) returns the first row of each group based on the ORDER BY clause.
The query uses DISTINCT ON (department) to pick one row per department. The ORDER BY department, salary DESC ensures the highest salary per department is first, so that row is returned.
In PostgreSQL, when using DISTINCT ON (column), why is it required that the ORDER BY clause starts with the same column(s) as in DISTINCT ON?
Think about how PostgreSQL picks the first row per group.
PostgreSQL uses the ORDER BY clause to decide which row is first in each group defined by DISTINCT ON. The ORDER BY must start with the DISTINCT ON columns to group rows correctly.
Given a table orders with columns customer_id, order_id, and order_date, which query returns the earliest order for each customer?
Remember the order of columns in ORDER BY must start with DISTINCT ON columns.
Option D correctly uses DISTINCT ON (customer_id) and orders by customer_id then order_date ascending to get the earliest order per customer.
Option D orders by order_date first, which breaks the DISTINCT ON requirement.
Option D uses DISTINCT (not DISTINCT ON), which does not group by customer_id.
Option D uses DISTINCT ON (order_date), which is incorrect for grouping by customer.
You have a large sales table with columns region, sale_id, and sale_date. You want to get the latest sale per region using DISTINCT ON. Which approach improves performance?
Think about how indexes can help ORDER BY and DISTINCT ON.
Creating an index on (region, sale_date DESC) matches the DISTINCT ON and ORDER BY columns, allowing PostgreSQL to efficiently find the first row per region.
Option A's index does not include region, so less efficient.
Option A has ORDER BY columns in wrong order, breaking DISTINCT ON.
Option A is a different approach but may be less efficient for returning full rows.
Consider this query:
SELECT DISTINCT ON (category) category, product_id, price FROM products ORDER BY price DESC, category;
It is intended to return the most expensive product per category, but it returns unexpected results. What is the problem?
Check the order of columns in ORDER BY compared to DISTINCT ON.
The ORDER BY clause must start with the DISTINCT ON columns in the same order. Here, ORDER BY starts with price, not category, so PostgreSQL picks the first row by price overall, not per category.