Boolean indexing in Pandas - Time & Space Complexity
We want to understand how the time needed to filter data using Boolean indexing changes as the data grows.
How does the filtering time grow when we have more rows in our data?
Analyze the time complexity of the following code snippet.
import pandas as pd
df = pd.DataFrame({
'age': [23, 45, 12, 36, 27],
'score': [88, 92, 79, 95, 85]
})
filtered = df[df['age'] > 25]
This code creates a table with ages and scores, then selects rows where age is greater than 25.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Checking each row's 'age' value against 25.
- How many times: Once for every row in the table.
As the number of rows grows, the time to check each row grows too, roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 checks |
| 100 | 100 checks |
| 1000 | 1000 checks |
Pattern observation: Doubling the rows doubles the number of checks needed.
Time Complexity: O(n)
This means the time to filter grows linearly with the number of rows in the data.
[X] Wrong: "Filtering with Boolean indexing is instant no matter how big the data is."
[OK] Correct: Each row must be checked, so more rows mean more work and more time.
Understanding how filtering scales helps you write efficient data code and explain your choices clearly.
"What if we filter using multiple conditions combined with && or ||? How would the time complexity change?"