Sorting by index in Pandas - Time & Space Complexity
When we sort data by its index in pandas, we want to know how the time it takes changes as the data grows.
We ask: How does sorting by index get slower or faster when we have more rows?
Analyze the time complexity of the following code snippet.
import pandas as pd
df = pd.DataFrame({
'value': [5, 3, 6, 2, 8]
}, index=[4, 2, 5, 1, 3])
sorted_df = df.sort_index()
print(sorted_df)
This code creates a small table with an unordered index and sorts it by that index.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Comparing and swapping index values to order them.
- How many times: The sorting algorithm compares pairs of index entries multiple times, depending on the number of rows.
As the number of rows grows, the sorting work grows faster than just the number of rows.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 30 to 40 comparisons |
| 100 | About 500 to 700 comparisons |
| 1000 | About 10,000 to 15,000 comparisons |
Pattern observation: The work grows faster than the number of rows, roughly like rows times the log of rows.
Time Complexity: O(n log n)
This means if you double the number of rows, the sorting work grows a bit more than double, but not as much as square.
[X] Wrong: "Sorting by index takes the same time no matter how many rows there are."
[OK] Correct: Sorting needs to compare many pairs of rows, so more rows mean more comparisons and more time.
Understanding how sorting scales helps you explain how data operations behave as data grows, a useful skill in many data tasks.
"What if the index was already sorted? How would the time complexity change when sorting by index?"