np.union1d() for union in NumPy - Time & Space Complexity
We want to understand how the time needed to find the union of two arrays changes as the arrays get bigger.
Specifically, how does np.union1d() behave when input size grows?
Analyze the time complexity of the following code snippet.
import numpy as np
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([3, 4, 5, 6])
result = np.union1d(arr1, arr2)
print(result)
This code finds all unique elements from both arrays combined, sorted in order.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Sorting and merging the two arrays after removing duplicates.
- How many times: The arrays are traversed multiple times internally, mainly during sorting and merging steps.
As the size of the input arrays grows, the time to find the union grows a bit faster than the size itself.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 log 10 (around 33) |
| 100 | About 100 log 100 (around 664) |
| 1000 | About 1000 log 1000 (around 9966) |
Pattern observation: The operations grow a bit faster than the input size, roughly multiplying by the input size times the log of the input size.
Time Complexity: O(n log n)
This means the time needed grows a bit faster than the input size, mainly because sorting is involved.
[X] Wrong: "np.union1d() just scans the arrays once, so it runs in linear time O(n)."
[OK] Correct: Internally, np.union1d() sorts the combined data, which takes more time than just scanning once. So it is slower than linear time.
Understanding how functions like np.union1d() scale helps you reason about performance in data tasks. This skill shows you can think about efficiency, which is valuable in real projects.
"What if the input arrays were already sorted? How would that affect the time complexity of np.union1d()?"