np.unique() for unique elements in NumPy - Time & Space Complexity
We want to understand how the time to find unique elements in an array changes as the array gets bigger.
How does the work grow when we ask numpy to find unique values?
Analyze the time complexity of the following code snippet.
import numpy as np
arr = np.array([3, 1, 2, 3, 4, 2, 1])
unique_vals = np.unique(arr)
print(unique_vals)
This code finds all unique values in the array arr and returns them sorted.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Sorting the array elements internally.
- How many times: The sorting process compares elements multiple times, roughly proportional to n log n, where n is the number of elements.
As the array size grows, the sorting step takes more time, but not just straight doubling.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 30 to 40 operations |
| 100 | About 600 to 700 operations |
| 1000 | About 10,000 to 12,000 operations |
Pattern observation: The operations grow a bit faster than the input size, roughly like n times log n.
Time Complexity: O(n log n)
This means the time to find unique elements grows a bit faster than the size of the array, because sorting takes extra steps.
[X] Wrong: "Finding unique elements takes time proportional only to the number of elements (O(n))."
[OK] Correct: Because numpy sorts the array internally to find unique values, the sorting step adds extra work, making it grow faster than just the number of elements.
Understanding how numpy finds unique elements helps you explain efficiency when working with data. It shows you can think about what happens behind the scenes, which is a useful skill in data science.
"What if the input array was already sorted? How would the time complexity of np.unique() change?"