Sorting along axes in NumPy - Time & Space Complexity
Sorting data is common in data science to organize information. When sorting along an axis in numpy, we want to know how the time needed grows as the data size grows.
We ask: How does sorting time change when we have bigger arrays or sort along different axes?
Analyze the time complexity of the following code snippet.
import numpy as np
arr = np.random.rand(1000, 1000)
sorted_arr = np.sort(arr, axis=1)
This code creates a 1000x1000 array of random numbers and sorts each row independently.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Sorting each row of the 2D array.
- How many times: Sorting is done once per row, so 1000 times for 1000 rows.
Sorting each row takes time depending on the row length. Doing this for all rows adds up.
| Input Size (n x m) | Approx. Operations |
|---|---|
| 10 x 10 | 10 rows x sorting 10 items each ≈ 10 x 10 log 10 |
| 100 x 100 | 100 rows x sorting 100 items each ≈ 100 x 100 log 100 |
| 1000 x 1000 | 1000 rows x sorting 1000 items each ≈ 1000 x 1000 log 1000 |
Pattern observation: The time grows roughly with the number of rows times the sorting cost per row, which depends on the row length times its logarithm.
Time Complexity: O(n * m log m)
This means sorting each of the n rows of length m takes time proportional to n times m log m.
[X] Wrong: "Sorting the whole 2D array is just O(n log n) because it's one big sort."
[OK] Correct: Sorting along an axis sorts each row separately, so the time depends on sorting each row, not the whole array as one list.
Understanding how sorting time grows with data size helps you explain performance in real tasks. It shows you can think about how algorithms work on multi-dimensional data.
"What if we sorted along axis=0 (columns) instead of axis=1? How would the time complexity change?"