Multiple conditions with & and | in Pandas - Time & Space Complexity
We want to understand how the time needed to filter data with multiple conditions changes as the data grows.
How does combining conditions with & and | affect the work done?
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 100 # example size
df = pd.DataFrame({
'A': range(n),
'B': range(n, 0, -1),
'C': ['x', 'y'] * (n // 2)
})
filtered = df[(df['A'] > 10) & ((df['B'] < 50) | (df['C'] == 'x'))]
This code filters rows in a DataFrame using multiple conditions combined with & (and) and | (or).
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Checking each row against the conditions.
- How many times: Once per row, so
ntimes wherenis the number of rows.
Each row is checked once for all conditions combined. As the number of rows grows, the total checks grow proportionally.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 checks |
| 100 | About 100 checks |
| 1000 | About 1000 checks |
Pattern observation: The work grows directly with the number of rows.
Time Complexity: O(n)
This means the time to filter grows in a straight line with the number of rows.
[X] Wrong: "Using multiple conditions with & and | makes the filtering much slower, like multiplying the time by the number of conditions."
[OK] Correct: The conditions are checked together for each row in one pass, so time grows with rows, not conditions count.
Understanding how filtering with multiple conditions scales helps you explain data processing efficiency clearly and confidently.
What if we added a new condition that requires scanning another column? How would the time complexity change?