Overview - Correlated subqueries execution model

What is it?

A correlated subquery is a query nested inside another query that depends on values from the outer query. It runs once for each row processed by the outer query, using that row's data to filter or calculate results. This makes it different from a regular subquery, which runs only once. Correlated subqueries help answer questions where each row needs a custom check or calculation.

Why it matters

Without correlated subqueries, it would be hard to express queries that need to compare each row against related data dynamically. They allow databases to handle complex row-by-row logic inside a single query. Without them, developers would write more complicated code or multiple queries, leading to slower and error-prone applications.

Where it fits

Before learning correlated subqueries, you should understand basic SQL queries, subqueries, and how joins work. After mastering correlated subqueries, you can explore query optimization, window functions, and advanced SQL performance tuning.

Mental Model

Core Idea

A correlated subquery is like a mini-question asked repeatedly for each row of the main query, using that row's details to find a specific answer.

Think of it like...

Imagine you are checking each student's grades and, for every student, you ask the teacher separately how many assignments that student completed. You ask the same question many times, but each time with a different student's name.

Outer Query Row ──▶ Correlated Subquery using Outer Row Value
┌───────────────┐       ┌─────────────────────────┐
│ Outer Query   │       │ Subquery uses outer row  │
│ processes row │──────▶│ value to filter or compute│
└───────────────┘       └─────────────────────────┘

Build-Up - 6 Steps

1

FoundationUnderstanding Basic Subqueries

Concept: Learn what a subquery is and how it runs independently inside a main query.

A subquery is a query inside another query, usually in WHERE or SELECT clauses. It runs once and returns a value or set of values used by the outer query. For example, finding employees who work in the department with the highest budget uses a subquery to find that department.

Result

The outer query uses the subquery's result to filter or calculate its output.

Understanding that subqueries run independently helps you see why correlated subqueries behave differently.

2

FoundationIntroducing Correlated Subqueries

3

IntermediateExecution Flow of Correlated Subqueries

4

IntermediateCorrelated Subqueries vs Joins

5

AdvancedOptimization Techniques for Correlated Subqueries

6

ExpertInternal Execution Model and Planning

Under the Hood

When a correlated subquery runs, PostgreSQL processes the outer query row by row. For each row, it substitutes the outer row's values into the subquery and executes it. This repeated execution is often implemented as a nested loop join internally. The planner may transform the subquery into a join or use caching to optimize. The subquery cannot be executed independently because it references outer query columns.

Why designed this way?

Correlated subqueries were designed to allow flexible, row-dependent filtering and calculations within a single SQL statement. Early SQL standards included them to express queries that joins alone couldn't easily handle. The nested execution model is simple and intuitive but can be inefficient, so modern databases add optimizations. Alternatives like lateral joins and window functions have emerged to address performance and expressiveness.

┌───────────────┐
│ Outer Query   │
│ fetches row 1 │
└──────┬────────┘
       │ uses row 1 values
       ▼
┌───────────────┐
│ Subquery runs │
│ with row 1    │
└──────┬────────┘
       │ returns result
       ▼
┌───────────────┐
│ Outer Query   │
│ fetches row 2 │
└──────┬────────┘
       │ uses row 2 values
       ▼
┌───────────────┐
│ Subquery runs │
│ with row 2    │
└───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does a correlated subquery run only once per query or once per outer row? Commit to your answer.

Common Belief:Correlated subqueries run only once like normal subqueries.

Tap to reveal reality

Quick: Can all correlated subqueries be replaced by joins without changing results? Commit to yes or no.

Common Belief:Every correlated subquery can be rewritten as a join with the same result and performance.

Tap to reveal reality

Quick: Does PostgreSQL always execute correlated subqueries as nested loops? Commit to yes or no.

Common Belief:The database always executes correlated subqueries as simple nested loops without optimization.

Tap to reveal reality

Quick: Are correlated subqueries always slower than joins? Commit to yes or no.

Common Belief:Correlated subqueries are always slower than equivalent joins.

Tap to reveal reality

Expert Zone

1

Correlated subqueries can sometimes be optimized into semi-joins or anti-joins internally, which changes their execution cost drastically.

2

The planner's choice to cache subquery results depends on volatility of functions and data statistics, affecting repeated execution.

3

Using lateral joins can express correlated subqueries more explicitly and sometimes improve performance and clarity.

When NOT to use

Avoid correlated subqueries when processing large datasets with many outer rows, as repeated execution can be costly. Instead, use joins, lateral joins, or window functions for better performance and scalability.

Production Patterns

In production, correlated subqueries are often used for filtering rows based on aggregates or existence checks per row. Developers monitor execution plans and rewrite queries to joins or lateral joins when performance issues arise. Indexing foreign keys and filtering early are common patterns to optimize correlated subqueries.

Connections

Nested Loops Join

Correlated subqueries are often executed internally as nested loops joins.

Understanding nested loops helps explain why correlated subqueries can be slow and how the database processes them step-by-step.

Lateral Joins

Lateral joins generalize correlated subqueries by allowing subqueries to reference outer query rows explicitly.

Knowing lateral joins helps write clearer and more efficient queries that behave like correlated subqueries but with more control.

Functional Programming Closures

Correlated subqueries capture outer query variables like closures capture variables from outer scopes.

Recognizing this similarity clarifies how subqueries depend on outer data and why they cannot run independently.

Common Pitfalls

#1Writing a correlated subquery that runs inefficiently on large tables.

Wrong approach:SELECT e.name FROM employees e WHERE e.salary > (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id);

Correct approach:WITH dept_avg AS (SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id) SELECT e.name FROM employees e JOIN dept_avg d ON e.department_id = d.department_id WHERE e.salary > d.avg_salary;

Root cause:Not realizing the correlated subquery runs once per employee, causing repeated aggregation and slow performance.

#2Assuming correlated subqueries always produce the same results as joins.

Wrong approach:SELECT * FROM orders o WHERE EXISTS (SELECT 1 FROM customers c WHERE c.id = o.customer_id AND c.status = 'active'); -- assuming join is unnecessary

Correct approach:SELECT o.* FROM orders o JOIN customers c ON c.id = o.customer_id WHERE c.status = 'active';

Root cause:Misunderstanding that correlated subqueries can sometimes be replaced by joins for clarity and performance.

#3Using correlated subqueries with volatile functions causing unexpected results.

Wrong approach:SELECT id FROM items WHERE price > (SELECT random() * 100);

Correct approach:SELECT id FROM items WHERE price > (SELECT AVG(price) FROM items);

Root cause:Using non-deterministic functions inside correlated subqueries leads to unpredictable and inconsistent outputs.

Key Takeaways

Correlated subqueries run once for each row of the outer query, using that row's data to compute results.

They allow expressing complex row-dependent logic that simple joins cannot easily handle.

Because they execute repeatedly, correlated subqueries can be slower and require careful optimization.

PostgreSQL's planner may optimize correlated subqueries by rewriting them into joins or caching results.

Understanding their execution model helps write efficient queries and troubleshoot performance issues.