Right join behavior in Pandas - Time & Space Complexity
When we use a right join in pandas, we combine two tables based on matching values, keeping all rows from the right table.
We want to understand how the time it takes grows as the tables get bigger.
Analyze the time complexity of the following code snippet.
import pandas as pd
left = pd.DataFrame({
'key': [1, 2, 3],
'value_left': ['A', 'B', 'C']
})
right = pd.DataFrame({
'key': [2, 3, 4],
'value_right': ['X', 'Y', 'Z']
})
result = pd.merge(left, right, how='right', on='key')
This code joins two tables on the 'key' column, keeping all rows from the right table and matching rows from the left.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Matching keys between the two tables.
- How many times: Each row in the right table is checked against the left table to find matches.
As the number of rows in the tables grows, the work to find matching keys grows too.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 matching checks |
| 100 | About 100 matching checks |
| 1000 | About 1000 matching checks |
Pattern observation: The number of operations grows roughly in direct proportion to the size of the right table.
Time Complexity: O(n)
This means the time to do a right join grows linearly with the number of rows in the right table.
[X] Wrong: "The join time depends mostly on the left table size."
[OK] Correct: In a right join, all rows from the right table must be included, so the time mainly depends on the right table size.
Understanding how join operations scale helps you explain data merging clearly and confidently in real projects.
"What if we changed the join type to 'inner'? How would the time complexity change?"