Pandasdata~5 mins

Sorting by index in Pandas - Time & Space Complexity

Choose your learning style9 modes available

Time Complexity: Sorting by index

O(n log n)

Understanding Time Complexity

When we sort data by its index in pandas, we want to know how the time it takes changes as the data grows.

We ask: How does sorting by index get slower or faster when we have more rows?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

df = pd.DataFrame({
    'value': [5, 3, 6, 2, 8]
}, index=[4, 2, 5, 1, 3])

sorted_df = df.sort_index()
print(sorted_df)

This code creates a small table with an unordered index and sorts it by that index.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

Primary operation: Comparing and swapping index values to order them.
How many times: The sorting algorithm compares pairs of index entries multiple times, depending on the number of rows.

How Execution Grows With Input

As the number of rows grows, the sorting work grows faster than just the number of rows.

Pattern observation: The work grows faster than the number of rows, roughly like rows times the log of rows.

Final Time Complexity

Time Complexity: O(n log n)

This means if you double the number of rows, the sorting work grows a bit more than double, but not as much as square.

Common Mistake

[X] Wrong: "Sorting by index takes the same time no matter how many rows there are."

[OK] Correct: Sorting needs to compare many pairs of rows, so more rows mean more comparisons and more time.

Interview Connect

Understanding how sorting scales helps you explain how data operations behave as data grows, a useful skill in many data tasks.

Self-Check

"What if the index was already sorted? How would the time complexity change when sorting by index?"