Partial sorting with np.partition() in NumPy - Time & Space Complexity
We want to understand how the time needed to partially sort data with np.partition() changes as the data size grows.
Specifically, how does the work increase when we ask for elements around a certain position?
Analyze the time complexity of the following code snippet.
import numpy as np
arr = np.random.randint(0, 1000, size=1000)
k = 10
part = np.partition(arr, k)
result = part[:k]
This code finds the smallest 10 elements in an array of 1000 random numbers using partial sorting.
Look for loops or repeated steps inside the function.
- Primary operation: Partitioning the array around the kth element.
- How many times: The operation scans parts of the array multiple times but does not fully sort it.
As the array size grows, the time to partially sort grows roughly in a way that is faster than full sorting.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 20-30 operations |
| 100 | About 300-400 operations |
| 1000 | About 4000-5000 operations |
Pattern observation: The operations grow a bit faster than linearly but much slower than fully sorting the array.
Time Complexity: O(n)
This means the time to partially sort grows roughly in direct proportion to the size of the input array.
[X] Wrong: "Partial sorting with np.partition() takes as long as fully sorting the array."
[OK] Correct: Partial sorting only rearranges elements around the kth position, so it avoids the full sorting cost and runs faster.
Understanding how partial sorting works and its time cost shows you can choose the right tool for finding top elements efficiently, a useful skill in many data tasks.
"What if we asked for the top k elements multiple times on the same array? How would the time complexity change if we reused partial results?"