Boolean filtering in Data Analysis Python - Time & Space Complexity
We want to understand how the time to filter data grows as the data size increases.
How does the filtering step scale when we check each item for a condition?
Analyze the time complexity of the following code snippet.
import pandas as pd
data = pd.DataFrame({
'age': [23, 45, 12, 36, 27],
'score': [88, 92, 79, 85, 90]
})
filtered_data = data[data['age'] > 25]
This code filters rows where the 'age' value is greater than 25.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Checking each row's 'age' value against 25.
- How many times: Once for every row in the data.
As the number of rows grows, the number of checks grows the same way.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 checks |
| 100 | 100 checks |
| 1000 | 1000 checks |
Pattern observation: The work grows directly with the number of rows.
Time Complexity: O(n)
This means the time to filter grows in a straight line as the data gets bigger.
[X] Wrong: "Filtering is instant no matter how big the data is."
[OK] Correct: Each row must be checked, so more data means more work and more time.
Understanding how filtering scales helps you explain data processing speed clearly and confidently.
"What if we filter using two conditions combined with AND? How would the time complexity change?"