NULLs in JOIN conditions in SQL - Time & Space Complexity
When we join tables in a database, the time it takes depends on how many rows we compare. NULL values in join conditions can affect which rows match, but how does this impact the work done?
We want to understand how the presence of NULLs in join conditions affects the number of operations the database performs.
Analyze the time complexity of the following SQL join with NULLs in the condition.
SELECT a.id, b.id
FROM tableA a
LEFT JOIN tableB b
ON a.key = b.key
OR (a.key IS NULL AND b.key IS NULL);
This query joins two tables on a key column, treating NULLs as equal to each other in the join condition.
Look for repeated comparisons between rows.
- Primary operation: Comparing each row in
tableAto rows intableBto find matches. - How many times: Potentially every row in
tableAchecks against many rows intableB, especially if no indexes help.
As the number of rows grows, the comparisons increase.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 100 comparisons |
| 100 | About 10,000 comparisons |
| 1000 | About 1,000,000 comparisons |
Pattern observation: The work grows quickly as the tables get bigger, roughly multiplying the number of rows in both tables.
Time Complexity: O(n * m)
This means the time grows roughly by multiplying the number of rows in the first table by the number of rows in the second table.
[X] Wrong: "Adding NULL checks in the join condition makes the query faster because it filters rows early."
[OK] Correct: The NULL checks add complexity to each comparison and do not reduce the number of comparisons, so the total work can stay the same or increase.
Understanding how NULLs affect join conditions helps you explain query performance clearly. This skill shows you can think about how databases work behind the scenes, which is valuable in many real projects.
What if we replaced the OR condition with a COALESCE function to handle NULLs? How would that change the time complexity?