Pandasdata~5 mins

query() for fast filtering in Pandas - Time & Space Complexity

Choose your learning style9 modes available

Time Complexity: query() for fast filtering

O(n)

Understanding Time Complexity

We want to understand how the time to filter data using pandas' query() grows as the data size increases.

How does filtering with query() scale when we have more rows?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

df = pd.DataFrame({
    'age': range(1000),
    'score': range(1000, 2000)
})

filtered = df.query('age > 500 and score < 1600')

This code creates a DataFrame with 1000 rows and filters rows where age is over 500 and score is less than 1600.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

Primary operation: pandas checks each row to see if it meets the filter condition.
How many times: Once for every row in the DataFrame.

How Execution Grows With Input

As the number of rows grows, pandas must check each row once to apply the filter.

Pattern observation: The number of operations grows directly with the number of rows.

Final Time Complexity

Time Complexity: O(n)

This means the time to filter grows in a straight line as the data size grows.

Common Mistake

[X] Wrong: "Using query() filters data instantly no matter how big the data is."

[OK] Correct: Even though query() is fast, it still checks each row once, so bigger data takes more time.

Interview Connect

Understanding how filtering scales helps you explain your choices when working with large data in real projects.

Self-Check

"What if we used multiple query() calls chained together? How would the time complexity change?"