np.in1d() for membership testing in NumPy - Time & Space Complexity
We want to understand how the time needed to check if elements belong to another list grows as the lists get bigger.
How does the work increase when using np.in1d() with larger arrays?
Analyze the time complexity of the following code snippet.
import numpy as np
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([3, 4, 5, 6, 7])
result = np.in1d(arr1, arr2)
print(result)
This code checks which elements of arr1 are also in arr2, returning a boolean array.
- Primary operation: For each element in the first array, check if it exists in the second array.
- How many times: This check repeats once for every element in the first array.
As the first array gets bigger, the number of checks grows directly with its size.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 membership checks |
| 100 | About 100 membership checks |
| 1000 | About 1000 membership checks |
Pattern observation: The work grows in a straight line with the size of the first array.
Time Complexity: O(n)
This means the time to complete the check grows directly in proportion to the number of elements in the first array.
[X] Wrong: "The time depends on both arrays multiplied together because it checks every pair."
[OK] Correct: np.in1d() uses efficient methods internally, so it does not check every pair but rather looks up membership quickly, making the time mainly depend on the first array size.
Understanding how membership checks scale helps you explain and reason about data filtering and comparison tasks, which are common in data science work.
"What if we changed the second array to a Python set before checking membership? How would the time complexity change?"