Join algorithms (nested loop, hash, merge) in PostgreSQL - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When databases combine tables using joins, the way they do it affects how long it takes. We want to understand how the time needed grows as the tables get bigger.
How does the choice of join method change the work done as data grows?
Analyze the time complexity of these three join methods in PostgreSQL.
-- Nested Loop Join
SELECT * FROM tableA a
JOIN tableB b ON a.id = b.a_id;
-- Hash Join
SELECT * FROM tableA a
JOIN tableB b ON a.id = b.a_id;
-- Merge Join
SELECT * FROM tableA a
JOIN tableB b ON a.id = b.a_id
ORDER BY a.id, b.a_id;
These queries join two tables on matching IDs using different join algorithms.
Each join method repeats operations differently:
- Nested Loop Join: For each row in tableA, it scans all rows in tableB.
- Hash Join: Builds a hash table for one table, then looks up matches for each row in the other.
- Merge Join: Both tables are sorted, then scanned once together to find matches.
- Primary operation: Comparing rows between tables to find matches.
- How many times: Nested loop does this many times (rows in A x rows in B), hash join once per row after hashing, merge join once per row in sorted order.
Imagine tableA and tableB grow in size:
| Input Size (rows in each table) | Nested Loop Ops | Hash Join Ops | Merge Join Ops |
|---|---|---|---|
| 10 | 100 (10x10) | ~20 (build + probe) | ~20 (sorted scan) |
| 100 | 10,000 (100x100) | ~200 (build + probe) | ~200 (sorted scan) |
| 1000 | 1,000,000 (1000x1000) | ~2000 (build + probe) | ~2000 (sorted scan) |
Nested loop grows very fast as tables get bigger, while hash and merge join grow more slowly, roughly proportional to the total rows.
Time Complexity: O(n x m) for nested loop, O(n + m) for hash and merge joins
This means nested loop work grows by multiplying table sizes, but hash and merge join work grows by adding sizes, making them faster for big tables.
[X] Wrong: "All join methods take the same time no matter table size."
[OK] Correct: Nested loop joins check every pair, so time grows fast with size. Hash and merge join use smarter ways to avoid checking all pairs, so they scale better.
Understanding how join methods scale helps you explain database performance clearly. This skill shows you know how data size affects query speed, a key part of working with databases.
"What if one table is much smaller than the other? How would that affect the time complexity of each join method?"
Practice
Solution
Step 1: Understand Nested Loop Join usage
Nested Loop Join works by scanning one table and for each row scanning the other table. It is efficient when one table is small.Step 2: Compare with other joins
Hash Join is better for large unsorted tables, Merge Join requires sorted inputs. Nested Loop is simplest and best for small tables.Final Answer:
Nested Loop Join -> Option CQuick Check:
Small table + Nested Loop Join = best [OK]
- Confusing Hash Join as best for small tables
- Thinking Merge Join works well without sorted data
- Assuming Index Join is a separate join algorithm
Solution
Step 1: Understand PostgreSQL join hints
PostgreSQL does not support inline join hints like /*+ HashJoin */ or HASH JOIN syntax.Step 2: Use configuration to enable Hash Join
We can enable or disable join types using SET commands, e.g., SET enable_hashjoin = on; before the query.Final Answer:
SET enable_hashjoin = on; SELECT ... -> Option BQuick Check:
PostgreSQL uses SET to enable join types [OK]
- Using Oracle-style hints like /*+ HashJoin */
- Trying to write HASH JOIN in SQL syntax
- Using USING HASH() which is invalid
employees(emp_id, dept_id) and departments(dept_id, name), what join algorithm will PostgreSQL most likely use for this query?EXPLAIN SELECT * FROM employees JOIN departments ON employees.dept_id = departments.dept_id;Assuming both tables are large and
departments.dept_id is indexed.Solution
Step 1: Analyze table sizes and indexes
Both tables are large, so Nested Loop is inefficient. Departments has an index on dept_id.Step 2: Determine join algorithm choice
Hash Join is preferred for large tables without sorted data. Merge Join requires sorted inputs, which is not guaranteed here.Final Answer:
Hash Join -> Option DQuick Check:
Large tables + no sorted data = Hash Join [OK]
- Assuming index forces Nested Loop Join
- Thinking Merge Join is automatic without sorting
- Confusing Cross Join with inner join
SELECT * FROM orders o JOIN customers c ON o.customer_id = c.customer_id;But PostgreSQL is using a Nested Loop Join causing slow performance. Which fix will most likely improve performance by enabling a better join algorithm?
Solution
Step 1: Identify why Nested Loop is slow
Nested Loop is slow on large tables without indexes or when better joins exist but are not chosen.Step 2: Force PostgreSQL to avoid Nested Loop
Disabling Nested Loop join with SET enable_nestloop = off forces PostgreSQL to pick Hash or Merge Join, improving performance.Final Answer:
Disable Nested Loop Join with SET enable_nestloop = off; -> Option AQuick Check:
Disable Nested Loop to force better join [OK]
- Assuming adding index always fixes join choice
- Changing JOIN type without understanding join algorithms
- Adding ORDER BY does not affect join algorithm
sales(date, product_id, amount) and products(product_id, name). You want to join them on product_id efficiently. Which join algorithm should you prefer and why?Solution
Step 1: Identify join algorithm suited for sorted tables
Merge Join is designed to efficiently join two sorted inputs by merging them in order.Step 2: Compare with other join algorithms
Nested Loop is inefficient for large tables, Hash Join ignores sorting, Cross Join produces Cartesian product.Final Answer:
Merge Join, because it exploits sorted order for fast merging -> Option AQuick Check:
Sorted tables + Merge Join = efficient join [OK]
- Choosing Nested Loop for large sorted tables
- Ignoring sorting and picking Hash Join
- Confusing Cross Join with inner join
