0
0
PostgreSQLquery~15 mins

Subqueries in WHERE with IN in PostgreSQL - Deep Dive

Choose your learning style9 modes available
Overview - Subqueries in WHERE with IN
What is it?
A subquery in WHERE with IN is a way to filter data by checking if a value exists in a list returned by another query. It lets you ask a question inside another question, like asking if a value is among a set of results. This helps you find rows that match any of the values from the subquery. It is a powerful tool to connect data from different tables or conditions.
Why it matters
Without subqueries in WHERE with IN, you would have to manually combine data or write complex joins to filter results. This would be slow, error-prone, and hard to read. Using IN with subqueries makes queries simpler, more readable, and efficient. It helps you answer real questions like 'Which customers bought products from a certain category?' easily.
Where it fits
Before learning this, you should understand basic SELECT queries and simple WHERE filters. After this, you can learn about JOINs, EXISTS subqueries, and advanced filtering techniques. This concept is a stepping stone to mastering complex data retrieval in SQL.
Mental Model
Core Idea
A subquery in WHERE with IN checks if a value matches any value from a list produced by another query.
Think of it like...
Imagine you have a guest list for a party, and you want to check if a person is invited by looking if their name is on that list. The main query is asking about people, and the subquery is the guest list.
Main Query
  │
  ▼
Filter rows where value IN (Subquery results)
  │
  └─ Subquery: returns a list of values

Example:
SELECT * FROM orders
WHERE customer_id IN (
  SELECT id FROM customers WHERE city = 'Paris'
);
Build-Up - 7 Steps
1
FoundationUnderstanding Basic WHERE Filters
🤔
Concept: Learn how WHERE filters rows based on a condition.
The WHERE clause lets you pick rows that meet a condition. For example, SELECT * FROM products WHERE price > 100; returns products costing more than 100.
Result
Only rows with price greater than 100 are shown.
Knowing how WHERE works is essential because subqueries in WHERE build on this filtering concept.
2
FoundationIntroduction to Subqueries
🤔
Concept: A subquery is a query inside another query that returns data used by the outer query.
You can write a query inside parentheses. For example, SELECT id FROM customers WHERE city = 'Paris'; returns customer IDs from Paris. This can be used inside another query.
Result
The subquery returns a list of customer IDs from Paris.
Understanding subqueries lets you use dynamic lists of values instead of fixed ones.
3
IntermediateUsing IN with Static Lists
🤔
Concept: IN checks if a value matches any value in a fixed list.
Example: SELECT * FROM products WHERE category IN ('Books', 'Toys'); returns products in Books or Toys categories.
Result
Rows with category Books or Toys are selected.
IN simplifies checking multiple values without writing many OR conditions.
4
IntermediateCombining IN with Subqueries
🤔Before reading on: do you think the subquery runs once or multiple times for each row? Commit to your answer.
Concept: Use a subquery inside IN to filter rows based on dynamic lists from another table.
Example: SELECT * FROM orders WHERE customer_id IN ( SELECT id FROM customers WHERE city = 'Paris' ); This returns orders from customers living in Paris.
Result
Orders placed by customers from Paris are shown.
Knowing that the subquery produces a list used by IN helps you filter data across tables easily.
5
IntermediateSubquery Returns and Data Types
🤔
Concept: The subquery must return a single column with values compatible with the outer query's column.
If the outer query filters customer_id, the subquery must return customer IDs, not names or multiple columns. For example, SELECT id FROM customers is valid, but SELECT name FROM customers is not if filtering by customer_id.
Result
Query runs successfully only if data types match.
Matching data types prevents errors and ensures meaningful filtering.
6
AdvancedPerformance Considerations of IN Subqueries
🤔Before reading on: do you think IN with subqueries is always faster than JOINs? Commit to your answer.
Concept: IN with subqueries can be less efficient than JOINs in some cases, depending on database optimization.
Databases may execute IN subqueries by running the subquery once or multiple times. Sometimes rewriting with JOIN improves speed. EXPLAIN ANALYZE helps check performance.
Result
Understanding performance helps write faster queries.
Knowing when IN subqueries slow down queries guides you to better query design.
7
ExpertNULL Values and IN Subqueries Behavior
🤔Quick: If the subquery returns NULL, does IN return true, false, or unknown? Commit to your answer.
Concept: IN with subqueries behaves differently when NULLs appear, affecting query results unexpectedly.
If the subquery returns NULL among values, IN may return UNKNOWN, causing rows to be excluded. Use NOT IN carefully, as NULLs can cause no rows to match. IS NULL checks or EXISTS may be safer.
Result
Understanding NULL impact prevents subtle bugs in filtering.
Recognizing NULL's effect on IN avoids common logical errors in queries.
Under the Hood
When a query with WHERE ... IN (subquery) runs, the database first executes the subquery to get a list of values. Then, for each row in the outer query, it checks if the column's value is in that list. The database may optimize by caching the subquery results or rewriting the query internally to improve speed.
Why designed this way?
This design allows flexible filtering using dynamic sets of values without complex joins. It separates concerns: the subquery defines the set, and the outer query filters rows. Alternatives like JOINs exist but can be more complex or less readable. IN with subqueries balances expressiveness and simplicity.
┌───────────────┐       ┌───────────────┐
│ Outer Query   │       │ Subquery      │
│ SELECT * FROM │──────▶│ SELECT id FROM│
│ orders       │       │ customers     │
│ WHERE        │       │ WHERE city=   │
│ customer_id  │       │ 'Paris'       │
│ IN (Subquery)│       └───────────────┘
└───────────────┘
       │
       ▼
Filter orders where customer_id matches any id from subquery
Myth Busters - 4 Common Misconceptions
Quick: Does IN with a subquery always return rows if the subquery returns NULL? Commit yes or no.
Common Belief:If the subquery returns NULL, IN will still match rows as if NULL wasn't there.
Tap to reveal reality
Reality:If the subquery returns NULL, IN can return UNKNOWN, causing no rows to match.
Why it matters:This can cause queries to return empty results unexpectedly, leading to confusion and bugs.
Quick: Is IN with subqueries always faster than JOINs? Commit yes or no.
Common Belief:IN with subqueries is always more efficient than JOINs because it looks simpler.
Tap to reveal reality
Reality:Sometimes JOINs are faster because databases optimize joins better than IN subqueries.
Why it matters:Choosing the wrong approach can slow down applications and waste resources.
Quick: Does the subquery inside IN run once or for every row? Commit your answer.
Common Belief:The subquery runs once and its results are reused for all rows in the outer query.
Tap to reveal reality
Reality:While often the subquery runs once, some databases or query plans may run it multiple times, affecting performance.
Why it matters:Assuming it runs once can lead to unexpected slow queries if the subquery is expensive.
Quick: Can the subquery inside IN return multiple columns? Commit yes or no.
Common Belief:The subquery can return multiple columns and IN will check all of them.
Tap to reveal reality
Reality:The subquery must return exactly one column; multiple columns cause errors.
Why it matters:Misunderstanding this causes syntax errors and wasted debugging time.
Expert Zone
1
IN subqueries can sometimes be rewritten by the optimizer as semi-joins, which improves performance without changing query logic.
2
The presence of NULLs in subquery results can silently change query outcomes, so explicit NULL handling is crucial in production queries.
3
Using EXISTS instead of IN can avoid some pitfalls with NULLs and performance, but changes the logic subtly and must be chosen carefully.
When NOT to use
Avoid IN with subqueries when the subquery returns large datasets or when performance is critical; consider JOINs or EXISTS instead. Also, avoid IN if NULL values in subquery results are not handled properly; use EXISTS or explicit NULL checks.
Production Patterns
In real systems, IN with subqueries is often used for filtering by user permissions, categories, or related entities. It is common to combine with indexes on the subquery columns for speed. Sometimes, developers rewrite IN subqueries as JOINs or use materialized views for heavy queries.
Connections
JOINs in SQL
Alternative method to combine data from multiple tables, often interchangeable with IN subqueries.
Understanding JOINs helps you choose the best way to filter data, balancing readability and performance.
Set Theory in Mathematics
IN with subqueries represents checking membership in a set, a fundamental concept in set theory.
Knowing set membership concepts clarifies why IN works as a filter and how it relates to unions and intersections.
Filtering in Spreadsheet Software
Similar to filtering rows based on a list of values in spreadsheet filters or conditional formatting.
Recognizing this connection helps non-technical users grasp SQL filtering by relating it to familiar spreadsheet tasks.
Common Pitfalls
#1Using NOT IN without considering NULLs in subquery results.
Wrong approach:SELECT * FROM orders WHERE customer_id NOT IN (SELECT id FROM customers WHERE city = 'Paris');
Correct approach:SELECT * FROM orders WHERE customer_id NOT IN (SELECT id FROM customers WHERE city = 'Paris' AND id IS NOT NULL);
Root cause:NOT IN returns no rows if the subquery contains NULL, so filtering fails silently.
#2Subquery returns multiple columns causing syntax error.
Wrong approach:SELECT * FROM orders WHERE customer_id IN (SELECT id, name FROM customers);
Correct approach:SELECT * FROM orders WHERE customer_id IN (SELECT id FROM customers);
Root cause:IN expects a single column list; multiple columns cause errors.
#3Assuming subquery runs once when it runs multiple times, causing slow queries.
Wrong approach:SELECT * FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE expensive_condition);
Correct approach:WITH customer_ids AS (SELECT id FROM customers WHERE expensive_condition) SELECT * FROM orders WHERE customer_id IN (SELECT id FROM customer_ids);
Root cause:Not caching subquery results leads to repeated expensive computations.
Key Takeaways
Subqueries in WHERE with IN let you filter rows by checking if a value is in a dynamic list from another query.
The subquery must return a single column with compatible data types to the outer query's column.
NULL values in subquery results can cause unexpected behavior, especially with NOT IN, so handle NULLs carefully.
Performance of IN subqueries varies; sometimes JOINs or EXISTS are better choices.
Understanding how IN with subqueries works helps write clearer, more efficient, and correct SQL queries.