Left join behavior in Pandas - Time & Space Complexity
When we combine two tables using a left join, we want to see how the time it takes grows as the tables get bigger.
We ask: How does the work increase when the number of rows in the tables increases?
Analyze the time complexity of the following code snippet.
import pandas as pd
left = pd.DataFrame({'key': [1, 2, 3, 4], 'value_left': ['A', 'B', 'C', 'D']})
right = pd.DataFrame({'key': [3, 4, 5], 'value_right': ['X', 'Y', 'Z']})
result = pd.merge(left, right, on='key', how='left')
This code joins two tables on the 'key' column, keeping all rows from the left table and matching rows from the right.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: For each row in the left table, find matching rows in the right table.
- How many times: This happens once per row in the left table.
As the left table grows, the number of lookups grows too. The right table size affects lookup speed but is usually handled efficiently.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 lookups |
| 100 | About 100 lookups |
| 1000 | About 1000 lookups |
Pattern observation: The work grows roughly in direct proportion to the number of rows in the left table.
Time Complexity: O(n)
This means the time to do a left join grows linearly with the number of rows in the left table.
[X] Wrong: "The join time depends mostly on the size of the right table."
[OK] Correct: The left join keeps all left rows, so the main work is done once per left row. The right table is usually indexed for fast lookups, so its size affects performance less.
Understanding how joins scale helps you explain data processing clearly and shows you know how databases handle big data efficiently.
"What if we changed the join type from left to inner? How would the time complexity change?"