Pandasdata~5 mins

Sorting by values in Pandas - Time & Space Complexity

Choose your learning style9 modes available

Time Complexity: Sorting by values

O(n log n)

Understanding Time Complexity

When we sort data in pandas, we want to know how the time it takes changes as the data grows.

We ask: How does sorting time increase when we have more rows?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.


import pandas as pd

df = pd.DataFrame({
    'A': [5, 2, 9, 1, 7],
    'B': [3, 8, 4, 6, 0]
})
sorted_df = df.sort_values(by='A')

This code creates a small table and sorts it by the values in column 'A'.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

Primary operation: Comparing and rearranging rows based on column 'A' values.
How many times: The sorting algorithm compares elements multiple times, roughly proportional to the number of rows times the logarithm of the number of rows.

How Execution Grows With Input

As the number of rows grows, the sorting work grows a bit faster than just the number of rows.

Pattern observation: The work grows faster than just the number of rows, but not as fast as the square of rows.

Final Time Complexity

Time Complexity: O(n log n)

This means sorting takes more time as data grows, but it grows in a balanced way, not too slow or too fast.

Common Mistake

[X] Wrong: "Sorting time grows exactly like the number of rows (linear)."

[OK] Correct: Sorting needs to compare many pairs, so it grows a bit faster than just the number of rows.

Interview Connect

Understanding sorting time helps you explain how data operations scale, a useful skill when working with real datasets.

Self-Check

"What if we sorted by multiple columns instead of one? How would the time complexity change?"