Outer join behavior in Pandas - Time & Space Complexity
When we combine two tables using an outer join, we want to see how the time it takes grows as the tables get bigger.
We ask: How does the work increase when the number of rows in each table grows?
Analyze the time complexity of the following code snippet.
import pandas as pd
left = pd.DataFrame({'key': [1, 2, 3], 'value_left': ['A', 'B', 'C']})
right = pd.DataFrame({'key': [2, 3, 4], 'value_right': ['D', 'E', 'F']})
result = pd.merge(left, right, on='key', how='outer')
This code merges two tables on a key column, keeping all rows from both tables.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Matching keys between the two tables to combine rows.
- How many times: Each row in the first table is compared to rows in the second table to find matches.
As the number of rows in each table grows, the work to find matching keys grows too.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 100 comparisons |
| 100 | About 10,000 comparisons |
| 1000 | About 1,000,000 comparisons |
Pattern observation: The work grows quickly as both tables get bigger, roughly multiplying their sizes.
Time Complexity: O(n * m)
This means the time to join grows roughly by multiplying the number of rows in each table.
[X] Wrong: "Outer joins always take the same time no matter the table size."
[OK] Correct: The time depends on how many rows each table has because the join checks many pairs of rows.
Understanding how joins scale helps you explain database performance clearly and shows you know how data size affects work.
"What if we changed the join type from outer to inner? How would the time complexity change?"