Boolean indexing in Data Analysis Python - Time & Space Complexity
We want to understand how the time to filter data using Boolean indexing changes as the data size grows.
How does the filtering time increase when we have more data rows?
Analyze the time complexity of the following code snippet.
import pandas as pd
data = pd.DataFrame({
'age': [23, 45, 12, 36, 52],
'score': [88, 92, 79, 94, 67]
})
filtered = data[data['age'] > 30]
This code filters rows where the 'age' column is greater than 30.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Checking each row's 'age' value against 30.
- How many times: Once for every row in the data.
As the number of rows grows, the filtering checks grow at the same rate.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 checks |
| 100 | 100 checks |
| 1000 | 1000 checks |
Pattern observation: The number of operations grows directly with the number of rows.
Time Complexity: O(n)
This means the time to filter grows in direct proportion to the number of rows in the data.
[X] Wrong: "Filtering with Boolean indexing is instant no matter how big the data is."
[OK] Correct: Each row must be checked, so more rows mean more work and more time.
Understanding how filtering scales helps you explain data processing speed and efficiency clearly in real projects.
"What if we filter using multiple conditions combined with &? How would the time complexity change?"