Why boolean masking matters in NumPy - Performance Analysis
We want to see how fast numpy handles selecting data using boolean masks.
How does the time to pick items grow when the data gets bigger?
Analyze the time complexity of the following code snippet.
import numpy as np
arr = np.arange(1000000)
mask = arr % 2 == 0
filtered = arr[mask]
This code creates a large array, makes a mask for even numbers, and selects those numbers.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Checking each element to see if it is even (creating the mask).
- How many times: Once for every element in the array.
- Secondary operation: Using the mask to pick elements (also touches each element once).
As the array gets bigger, the time to check and select grows in a straight line.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 checks and 10 picks |
| 100 | About 100 checks and 100 picks |
| 1000 | About 1000 checks and 1000 picks |
Pattern observation: The work grows directly with the number of items.
Time Complexity: O(n)
This means the time to filter grows in a straight line as the data size grows.
[X] Wrong: "Boolean masking is instant no matter how big the data is."
[OK] Correct: The mask must check every item, so bigger data means more work and more time.
Understanding how boolean masking scales helps you explain data filtering clearly and confidently in real tasks.
"What if we used multiple conditions combined in the mask? How would the time complexity change?"