np.unique() for unique values in NumPy - Time & Space Complexity
We want to understand how the time needed to find unique values in an array changes as the array gets bigger.
How does the work grow when we ask numpy to find unique items?
Analyze the time complexity of the following code snippet.
import numpy as np
arr = np.array([3, 1, 2, 3, 4, 1, 5])
unique_vals = np.unique(arr)
print(unique_vals)
This code finds all unique values in the array arr and returns them sorted.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Sorting the array elements to group duplicates.
- How many times: The sorting process compares elements multiple times, roughly proportional to the number of elements times the logarithm of that number.
As the array size grows, the time to find unique values grows a bit faster than the size itself but not as fast as the square of the size.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 30 to 40 operations |
| 100 | About 600 to 700 operations |
| 1000 | About 10,000 to 12,000 operations |
Pattern observation: The operations grow faster than the input size but slower than its square, roughly like size times log of size.
Time Complexity: O(n log n)
This means the time to find unique values grows a bit faster than the number of items but not as fast as checking every pair.
[X] Wrong: "Finding unique values takes the same time no matter how many items there are."
[OK] Correct: The process needs to compare and sort items, so more items mean more work, not a fixed time.
Understanding how numpy finds unique values helps you explain how data processing scales, a useful skill when working with real datasets.
"What if the input array was already sorted? How would the time complexity change?"