0
0
PostgreSQLquery~15 mins

Why subqueries are needed in PostgreSQL - Why It Works This Way

Choose your learning style9 modes available
Overview - Why subqueries are needed
What is it?
A subquery is a query written inside another query. It lets you use the result of one query as input for another. This helps break complex questions into smaller, manageable parts. Subqueries can appear in SELECT, FROM, or WHERE clauses.
Why it matters
Without subqueries, you would struggle to ask layered questions in a database. They let you find answers that depend on other answers, like finding customers who bought the most expensive product. Without them, queries would be longer, harder to write, and less clear.
Where it fits
Before learning subqueries, you should understand basic SELECT queries and filtering with WHERE. After subqueries, you can learn about JOINs, window functions, and query optimization to handle even more complex data questions.
Mental Model
Core Idea
A subquery is a question inside a question that helps solve complex data problems step-by-step.
Think of it like...
Imagine you want to find the tallest person in a room, then find who is sitting next to them. First, you find the tallest person (subquery), then you use that answer to find their neighbor (outer query).
┌───────────────┐
│ Outer Query   │
│  Uses result  │
│  from subquery│
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Subquery      │
│  Finds data   │
│  needed first │
└───────────────┘
Build-Up - 7 Steps
1
FoundationBasic SELECT Query Review
🤔
Concept: Understanding how to retrieve data from a table using simple SELECT statements.
A SELECT query asks the database to give you certain columns from a table. For example, SELECT name FROM customers; returns all customer names.
Result
A list of names from the customers table.
Knowing how to write simple queries is essential before adding complexity like subqueries.
2
FoundationFiltering Data with WHERE Clause
🤔
Concept: Learn how to narrow down results using conditions.
The WHERE clause lets you pick only rows that meet certain criteria. For example, SELECT name FROM customers WHERE age > 30; returns names of customers older than 30.
Result
A filtered list of customer names based on age.
Filtering is the first step to asking more precise questions, which subqueries build upon.
3
IntermediateIntroducing Subqueries in WHERE
🤔Before reading on: do you think you can use a query inside WHERE to compare values? Commit to yes or no.
Concept: Using a query inside the WHERE clause to compare each row against a set of values or a single value.
You can write a query inside WHERE to filter rows based on another query's result. For example, SELECT name FROM customers WHERE id IN (SELECT customer_id FROM orders WHERE total > 100); finds customers who made orders over 100.
Result
Names of customers who placed expensive orders.
Understanding that subqueries can dynamically filter data based on other data makes queries more powerful and flexible.
4
IntermediateSubqueries in FROM Clause
🤔Before reading on: do you think a subquery can act like a temporary table in FROM? Commit to yes or no.
Concept: Using a subquery as a temporary table to organize or summarize data before the main query uses it.
A subquery in FROM creates a temporary result set. For example, SELECT avg_total FROM (SELECT customer_id, AVG(total) AS avg_total FROM orders GROUP BY customer_id) AS avg_orders WHERE avg_total > 50; finds customers with average order totals above 50.
Result
List of average order totals above 50 per customer.
Seeing subqueries as temporary tables helps break complex queries into understandable parts.
5
IntermediateSubqueries in SELECT Clause
🤔Before reading on: can a subquery return a single value for each row in SELECT? Commit to yes or no.
Concept: Using subqueries to calculate values for each row in the main query's result.
You can put a subquery in SELECT to get related data. For example, SELECT name, (SELECT COUNT(*) FROM orders WHERE orders.customer_id = customers.id) AS order_count FROM customers; shows each customer's name and how many orders they made.
Result
A list of customers with their order counts.
Using subqueries in SELECT lets you enrich each row with extra calculated data.
6
AdvancedCorrelated vs Non-Correlated Subqueries
🤔Before reading on: do you think subqueries can depend on each row of the outer query? Commit to yes or no.
Concept: Understanding the difference between subqueries that run once (non-correlated) and those that run for each row (correlated).
Non-correlated subqueries run once and return a result used by the outer query. Correlated subqueries use values from the outer query and run repeatedly. For example, a correlated subquery counts orders per customer by referencing the outer customer's id.
Result
More precise and dynamic query results depending on row context.
Knowing this difference helps write efficient queries and avoid performance issues.
7
ExpertPerformance Implications and Optimization
🤔Before reading on: do you think all subqueries perform equally well? Commit to yes or no.
Concept: Understanding how subqueries affect query speed and how to optimize or rewrite them.
Subqueries, especially correlated ones, can slow down queries because they run multiple times. Sometimes rewriting subqueries as JOINs or using WITH (CTE) clauses improves performance. PostgreSQL query planner decides the best way but knowing this helps write better queries.
Result
Faster queries and better resource use in real applications.
Understanding performance helps avoid slow queries and scale databases effectively.
Under the Hood
When a query with a subquery runs, the database executes the inner query first (for non-correlated subqueries) or repeatedly for each row (for correlated subqueries). The results are stored temporarily and used by the outer query to filter, join, or calculate values. The query planner analyzes the whole query to decide the best execution plan, sometimes flattening subqueries or converting them to joins.
Why designed this way?
Subqueries were introduced to let users express complex questions naturally and modularly. Instead of writing one huge query, breaking it into parts makes it easier to understand and maintain. Alternatives like joins existed but subqueries offer more flexibility in some cases, especially for filtering and aggregation.
┌───────────────┐
│ Outer Query   │
│  Uses subquery│
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Subquery      │
│  Executes     │
│  First        │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Temporary     │
│  Result Set   │
└───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do you think subqueries always run faster than joins? Commit to yes or no.
Common Belief:Subqueries are always faster and better than joins.
Tap to reveal reality
Reality:Subqueries can be slower, especially correlated ones, because they may run multiple times. Joins are often more efficient.
Why it matters:Believing this can lead to writing slow queries that hurt application performance.
Quick: Can subqueries return multiple columns in WHERE clause? Commit to yes or no.
Common Belief:Subqueries in WHERE can return multiple columns to compare with outer query.
Tap to reveal reality
Reality:Subqueries in WHERE must return a single column or a single value to compare properly.
Why it matters:Trying to return multiple columns causes errors and confusion in query writing.
Quick: Do you think subqueries always run before the outer query? Commit to yes or no.
Common Belief:All subqueries run completely before the outer query starts.
Tap to reveal reality
Reality:Correlated subqueries run repeatedly for each row of the outer query, not just once before it.
Why it matters:Misunderstanding this leads to unexpected slow queries and difficulty optimizing.
Quick: Can subqueries replace all joins? Commit to yes or no.
Common Belief:Subqueries can always replace joins without any difference.
Tap to reveal reality
Reality:Some queries are more naturally and efficiently expressed with joins; subqueries are not always a substitute.
Why it matters:Using subqueries blindly can make queries harder to read and slower.
Expert Zone
1
Some correlated subqueries can be internally transformed by PostgreSQL into joins for better performance, but this depends on query structure.
2
Using WITH (Common Table Expressions) can make complex subqueries more readable and sometimes optimize execution by materializing results.
3
Subqueries in SELECT that return multiple rows cause errors; understanding scalar vs set-returning subqueries is key for advanced query writing.
When NOT to use
Avoid subqueries when performance is critical and the same result can be achieved with joins or window functions. For large datasets, correlated subqueries often slow down queries. Instead, use JOINs, CTEs, or window functions for better efficiency.
Production Patterns
In real systems, subqueries are used for filtering based on aggregated data, calculating per-row metrics, and isolating complex logic. Developers often combine subqueries with CTEs for clarity and maintainability. Query planners in PostgreSQL optimize many subqueries, but understanding when to rewrite them is crucial for production performance.
Connections
Functional Programming
Subqueries are like nested function calls where one function's output feeds another.
Understanding subqueries as nested operations helps grasp how data flows and dependencies work in queries.
Mathematical Set Theory
Subqueries often represent subsets or filtered sets used within larger sets.
Knowing set operations clarifies how subqueries filter or aggregate data within the whole dataset.
Project Management
Breaking a big task into smaller subtasks mirrors how subqueries break complex queries into parts.
Seeing subqueries as modular steps helps appreciate their role in managing complexity.
Common Pitfalls
#1Using a subquery in WHERE that returns multiple columns.
Wrong approach:SELECT name FROM customers WHERE (SELECT id, age FROM orders) = customers.id;
Correct approach:SELECT name FROM customers WHERE id IN (SELECT customer_id FROM orders);
Root cause:Misunderstanding that WHERE expects a single column or value for comparison.
#2Writing a correlated subquery that causes slow performance.
Wrong approach:SELECT name FROM customers WHERE EXISTS (SELECT 1 FROM orders WHERE orders.customer_id = customers.id AND orders.total > 100);
Correct approach:WITH expensive_orders AS (SELECT DISTINCT customer_id FROM orders WHERE total > 100) SELECT name FROM customers JOIN expensive_orders ON customers.id = expensive_orders.customer_id;
Root cause:Not realizing correlated subqueries run once per row, causing inefficiency.
#3Expecting subqueries to always run before outer query.
Wrong approach:Assuming SELECT * FROM customers WHERE id = (SELECT max(id) FROM customers); runs subquery once but using correlated subquery instead.
Correct approach:SELECT * FROM customers WHERE id = (SELECT max(id) FROM customers);
Root cause:Confusing correlated and non-correlated subqueries and their execution timing.
Key Takeaways
Subqueries let you ask layered questions by embedding one query inside another.
They make complex data retrieval easier to write and understand by breaking problems into parts.
Subqueries can appear in WHERE, FROM, or SELECT clauses, each serving different purposes.
Understanding correlated versus non-correlated subqueries is key to writing efficient queries.
Knowing when to use subqueries versus joins or CTEs helps optimize performance in real databases.