Join algorithms (nested loop, sort-merge, hash join) in DBMS Theory - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When databases combine tables using joins, the method chosen affects how long it takes. Understanding time complexity helps us see how the work grows as tables get bigger.
We want to know: How does the time to join tables change as the number of rows increases?
Analyze the time complexity of these three common join methods.
-- Nested Loop Join
FOR each row in TableA LOOP
FOR each row in TableB LOOP
IF join condition matches THEN
output joined row;
END IF;
END LOOP;
END LOOP;
-- Sort-Merge Join
Sort TableA and TableB on join key;
Merge rows by scanning both sorted tables once;
-- Hash Join
Build hash table on smaller table using join key;
Probe hash table with rows from larger table;
These snippets show how each join method processes rows to combine tables.
Look at the main repeated steps in each join:
- Nested Loop Join: Two loops, one inside the other, checking every pair of rows.
- Sort-Merge Join: Sorting both tables, then one pass through both sorted lists.
- Hash Join: Building a hash table from one table, then checking each row of the other table against it.
Imagine both tables have n rows:
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | Nested Loop: 100, Sort-Merge: ~66 + 20, Hash Join: ~10 + 10 |
| 100 | Nested Loop: 10,000, Sort-Merge: ~1,300 + 200, Hash Join: ~100 + 100 |
| 1000 | Nested Loop: 1,000,000, Sort-Merge: ~20,000 + 2,000, Hash Join: ~1000 + 1000 |
Nested loops grow very fast as tables get bigger, while sort-merge and hash join grow more slowly, roughly doubling or slightly more.
Time Complexity: O(n^2) for Nested Loop Join, O(n log n) for Sort-Merge Join, and O(n) for Hash Join
This means nested loops take much longer as tables grow, sorting takes more time but less than nested loops, and hashing is usually fastest for large tables.
[X] Wrong: "All join methods take the same time regardless of table size."
[OK] Correct: Different join methods handle data differently, so their time grows at different rates as tables get bigger.
Knowing how join methods scale helps you explain database performance clearly. This skill shows you understand how data size affects query speed, a key part of working with databases.
What if one table is much smaller than the other? How would that affect the time complexity of each join method?
Practice
Solution
Step 1: Understand the nested loop join process
Nested loop join works by taking each row from the first table and comparing it with every row in the second table to find matches.Step 2: Compare with other join types
Sort-merge join sorts and merges, hash join uses hashing, and index join uses indexes, so they do not compare every pair.Final Answer:
Nested loop join -> Option DQuick Check:
Every row compared = Nested loop join [OK]
- Confusing nested loop with hash join
- Thinking sort-merge compares all pairs
- Assuming index join is same as nested loop
Solution
Step 1: Recall hash join working
Hash join builds a hash table on one table's join key to quickly find matching rows from the other table.Step 2: Eliminate other options
Sorting and merging is sort-merge join, comparing all pairs is nested loop, and using indexes is index join, not hash join.Final Answer:
Use a hash table to find matching rows quickly -> Option BQuick Check:
Hash table = fast matching [OK]
- Mixing hash join with sort-merge join
- Thinking hash join compares all pairs
- Confusing index usage with hash join
Solution
Step 1: Analyze the condition of sorted tables
Since both tables are sorted on the join key, sort-merge join can efficiently merge them by scanning once through both tables.Step 2: Compare with other algorithms
Nested loop join checks all pairs (slow), hash join builds hash table (extra work), Cartesian join is unrelated and very expensive.Final Answer:
Sort-merge join -> Option CQuick Check:
Sorted tables = sort-merge join best [OK]
- Choosing nested loop for sorted tables
- Assuming hash join is always fastest
- Confusing Cartesian join with normal join
Solution
Step 1: Understand hash join memory use
Hash join builds a hash table in memory. If it is too large, it spills to disk, slowing performance.Step 2: Evaluate other options
Sorted tables do not slow hash join, unique keys help performance, and nested loop join is a different algorithm.Final Answer:
The hash table does not fit in memory causing disk spills -> Option AQuick Check:
Memory overflow = slow hash join [OK]
- Blaming sorted tables for hash join slowness
- Ignoring memory limits in hash join
- Confusing nested loop join with hash join
Solution
Step 1: Consider table sizes and memory
Table A fits in memory, so building a hash table on it is efficient. Table B is large and unsorted, so sorting it would be expensive.Step 2: Choose join algorithm minimizing disk I/O
Hash join uses the smaller table in memory to quickly find matches in the larger table, reducing disk reads compared to sorting or nested loops.Final Answer:
Hash join, because building a hash on smaller Table A allows fast matching -> Option AQuick Check:
Small table in memory = hash join best [OK]
- Choosing nested loop for large tables
- Preferring sort-merge without considering sorting cost
- Confusing Cartesian join with normal join
