Merging on index in Pandas - Time & Space Complexity
When we combine two tables by matching their row labels, it takes some time depending on how big the tables are.
We want to understand how the time needed grows as the tables get bigger.
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 10 # Example size
# Create two dataframes with indexes
left = pd.DataFrame({'A': range(n)}, index=range(n))
right = pd.DataFrame({'B': range(n)}, index=range(n))
# Merge on index
result = pd.merge(left, right, left_index=True, right_index=True)
This code merges two dataframes by matching their indexes, combining rows with the same index label.
- Primary operation: Checking each index in one dataframe and finding the matching index in the other.
- How many times: Once for each row in the first dataframe, so n times.
As the number of rows grows, the work to match indexes grows roughly in the same way.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 index matches |
| 100 | About 100 index matches |
| 1000 | About 1000 index matches |
Pattern observation: The number of operations grows directly with the number of rows.
Time Complexity: O(n)
This means the time to merge grows in a straight line as the number of rows increases.
[X] Wrong: "Merging on index is instant no matter how big the dataframes are."
[OK] Correct: Even though indexes help find matches quickly, the operation still needs to check each row, so it takes longer as the data grows.
Understanding how merging on index scales helps you explain data combining tasks clearly and shows you know how data size affects performance.
"What if the indexes were not sorted or unique? How would the time complexity change?"