0
0
Pandasdata~5 mins

Sorting by values in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Sorting by values
O(n log n)
Understanding Time Complexity

When we sort data in pandas, we want to know how the time it takes changes as the data grows.

We ask: How does sorting time increase when we have more rows?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.


import pandas as pd

df = pd.DataFrame({
    'A': [5, 2, 9, 1, 7],
    'B': [3, 8, 4, 6, 0]
})
sorted_df = df.sort_values(by='A')

This code creates a small table and sorts it by the values in column 'A'.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Comparing and rearranging rows based on column 'A' values.
  • How many times: The sorting algorithm compares elements multiple times, roughly proportional to the number of rows times the logarithm of the number of rows.
How Execution Grows With Input

As the number of rows grows, the sorting work grows a bit faster than just the number of rows.

Input Size (n)Approx. Operations
10About 30 to 40 comparisons
100About 600 to 700 comparisons
1000About 10,000 to 12,000 comparisons

Pattern observation: The work grows faster than just the number of rows, but not as fast as the square of rows.

Final Time Complexity

Time Complexity: O(n log n)

This means sorting takes more time as data grows, but it grows in a balanced way, not too slow or too fast.

Common Mistake

[X] Wrong: "Sorting time grows exactly like the number of rows (linear)."

[OK] Correct: Sorting needs to compare many pairs, so it grows a bit faster than just the number of rows.

Interview Connect

Understanding sorting time helps you explain how data operations scale, a useful skill when working with real datasets.

Self-Check

"What if we sorted by multiple columns instead of one? How would the time complexity change?"