Bird
Raised Fist0
DBMS Theoryknowledge~10 mins

Join algorithms (nested loop, sort-merge, hash join) in DBMS Theory - Step-by-Step Execution

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Concept Flow - Join algorithms (nested loop, sort-merge, hash join)
Start Join Operation
Nested Loop
Scan Outer
For each Outer
Compare & Output
Join Result
End
The join operation starts by choosing one of three methods: nested loop, sort-merge, or hash join. Each method processes the tables differently to find matching rows and produce the joined result.
Execution Sample
DBMS Theory
for each row R in OuterTable:
  for each row S in InnerTable:
    if R.key == S.key:
      output (R, S)
This code shows a nested loop join: for every row in the outer table, it checks every row in the inner table for matching keys and outputs the joined rows.
Analysis Table
StepOuter Row (R)Inner Row (S)Condition R.key == S.key?ActionOutput
1R1 (key=2)S1 (key=1)2 == 1? NoNo output
2R1 (key=2)S2 (key=2)2 == 2? YesOutput (R1,S2)(R1,S2)
3R1 (key=2)S3 (key=3)2 == 3? NoNo output
4R2 (key=3)S1 (key=1)3 == 1? NoNo output
5R2 (key=3)S2 (key=2)3 == 2? NoNo output
6R2 (key=3)S3 (key=3)3 == 3? YesOutput (R2,S3)(R2,S3)
7R3 (key=4)S1 (key=1)4 == 1? NoNo output
8R3 (key=4)S2 (key=2)4 == 2? NoNo output
9R3 (key=4)S3 (key=3)4 == 3? NoNo output
10EndAll rows checked
💡 All rows in OuterTable and InnerTable have been compared; join operation ends.
State Tracker
VariableStartAfter Step 2After Step 6After Step 9Final
R (Outer Row)NoneR1 (key=2)R2 (key=3)R3 (key=4)End
S (Inner Row)NoneS2 (key=2)S3 (key=3)S3 (key=3)End
OutputEmpty(R1,S2)(R1,S2),(R2,S3)(R1,S2),(R2,S3)(R1,S2),(R2,S3)
Key Insights - 3 Insights
Why does the nested loop join check every row in the inner table for each outer row?
Because nested loop join compares each outer row with all inner rows to find matches, as shown in execution_table rows 1-9 where every combination is checked.
How does the join know when to stop?
It stops after all outer rows have been checked against all inner rows, as indicated in execution_table step 10 where the process ends.
Why are some comparisons resulting in no output?
Only when the keys match (condition true) does the join output a row; otherwise, no output is produced, as seen in execution_table steps with 'No output'.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 2. What is the output?
ANo output
B(R1,S2)
C(R1,S1)
D(R2,S3)
💡 Hint
Check the 'Output' column at step 2 in the execution_table.
At which step does the join first output a matching pair?
AStep 6
BStep 1
CStep 2
DStep 10
💡 Hint
Look for the first 'Yes' in the 'Condition' column and corresponding output in execution_table.
If OuterTable had one more row with key=2, how would the output change?
AMore outputs with pairs having key=2
BNo change in output
COutputs with key=3 would increase
DJoin would stop earlier
💡 Hint
Refer to variable_tracker and execution_table showing outputs depend on matching keys.
Concept Snapshot
Join algorithms combine rows from two tables based on matching keys.
Nested Loop: checks every pair of rows.
Sort-Merge: sorts tables then merges matching keys.
Hash Join: builds a hash table on one table, probes with the other.
Each method balances speed and resource use differently.
Full Transcript
Join algorithms are methods to combine rows from two tables based on matching keys. The nested loop join checks every row in the outer table against every row in the inner table, outputting pairs when keys match. The process continues until all rows are checked. This method is simple but can be slow for large tables. Sort-merge join sorts both tables and then merges them by scanning in order, which is faster if tables are sorted. Hash join builds a hash table from one table's keys and then quickly finds matches by probing with the other table's rows. Each join method has steps to process data and produce the joined result efficiently depending on the data size and indexes.

Practice

(1/5)
1. Which join algorithm compares each row of one table with every row of another table to find matching pairs?
easy
A. Index join
B. Sort-merge join
C. Hash join
D. Nested loop join

Solution

  1. Step 1: Understand the nested loop join process

    Nested loop join works by taking each row from the first table and comparing it with every row in the second table to find matches.
  2. Step 2: Compare with other join types

    Sort-merge join sorts and merges, hash join uses hashing, and index join uses indexes, so they do not compare every pair.
  3. Final Answer:

    Nested loop join -> Option D
  4. Quick Check:

    Every row compared = Nested loop join [OK]
Hint: Nested loop = check all pairs one by one [OK]
Common Mistakes:
  • Confusing nested loop with hash join
  • Thinking sort-merge compares all pairs
  • Assuming index join is same as nested loop
2. Which of the following is the correct description of a hash join algorithm?
easy
A. Sort both tables and merge matching rows
B. Use a hash table to find matching rows quickly
C. Compare each row of one table with every row of another
D. Use indexes to speed up join operations

Solution

  1. Step 1: Recall hash join working

    Hash join builds a hash table on one table's join key to quickly find matching rows from the other table.
  2. Step 2: Eliminate other options

    Sorting and merging is sort-merge join, comparing all pairs is nested loop, and using indexes is index join, not hash join.
  3. Final Answer:

    Use a hash table to find matching rows quickly -> Option B
  4. Quick Check:

    Hash table = fast matching [OK]
Hint: Hash join uses hash table for fast lookup [OK]
Common Mistakes:
  • Mixing hash join with sort-merge join
  • Thinking hash join compares all pairs
  • Confusing index usage with hash join
3. Consider two tables A and B, each with 1000 rows. Which join algorithm is likely to perform best if both tables are already sorted on the join key?
medium
A. Nested loop join
B. Hash join
C. Sort-merge join
D. Cartesian join

Solution

  1. Step 1: Analyze the condition of sorted tables

    Since both tables are sorted on the join key, sort-merge join can efficiently merge them by scanning once through both tables.
  2. Step 2: Compare with other algorithms

    Nested loop join checks all pairs (slow), hash join builds hash table (extra work), Cartesian join is unrelated and very expensive.
  3. Final Answer:

    Sort-merge join -> Option C
  4. Quick Check:

    Sorted tables = sort-merge join best [OK]
Hint: Sorted tables? Use sort-merge join [OK]
Common Mistakes:
  • Choosing nested loop for sorted tables
  • Assuming hash join is always fastest
  • Confusing Cartesian join with normal join
4. A database query uses a hash join but runs very slowly. Which of the following is a likely cause?
medium
A. The hash table does not fit in memory causing disk spills
B. The join keys are unique in both tables
C. Both tables are already sorted
D. The nested loop join was used instead

Solution

  1. Step 1: Understand hash join memory use

    Hash join builds a hash table in memory. If it is too large, it spills to disk, slowing performance.
  2. Step 2: Evaluate other options

    Sorted tables do not slow hash join, unique keys help performance, and nested loop join is a different algorithm.
  3. Final Answer:

    The hash table does not fit in memory causing disk spills -> Option A
  4. Quick Check:

    Memory overflow = slow hash join [OK]
Hint: Hash join slow? Check memory size for hash table [OK]
Common Mistakes:
  • Blaming sorted tables for hash join slowness
  • Ignoring memory limits in hash join
  • Confusing nested loop join with hash join
5. You have two large tables to join on a key. Table A fits in memory but Table B is very large and unsorted. Which join algorithm is best to minimize disk I/O and why?
hard
A. Hash join, because building a hash on smaller Table A allows fast matching
B. Sort-merge join, because sorting both tables reduces I/O
C. Cartesian join, because it joins all rows regardless of keys
D. Nested loop join, because it checks all pairs without sorting

Solution

  1. Step 1: Consider table sizes and memory

    Table A fits in memory, so building a hash table on it is efficient. Table B is large and unsorted, so sorting it would be expensive.
  2. Step 2: Choose join algorithm minimizing disk I/O

    Hash join uses the smaller table in memory to quickly find matches in the larger table, reducing disk reads compared to sorting or nested loops.
  3. Final Answer:

    Hash join, because building a hash on smaller Table A allows fast matching -> Option A
  4. Quick Check:

    Small table in memory = hash join best [OK]
Hint: Small table fits memory? Use hash join [OK]
Common Mistakes:
  • Choosing nested loop for large tables
  • Preferring sort-merge without considering sorting cost
  • Confusing Cartesian join with normal join