0
0
Pandasdata~5 mins

query() for fast filtering in Pandas

Choose your learning style9 modes available
Introduction

The query() method helps you quickly find rows in a table that match certain conditions. It makes filtering data easy and readable.

You want to select rows where a column value is greater than a number.
You need to filter data based on multiple conditions combined with AND or OR.
You want to write filtering code that looks clean and is easy to understand.
You are working with large tables and want a fast way to filter rows.
You prefer writing conditions as strings instead of using complex brackets.
Syntax
Pandas
DataFrame.query('condition')

The condition is a string that looks like a simple expression, e.g., 'age > 30'.

You can use column names directly inside the string without extra brackets.

Examples
Selects rows where the age column is greater than 25.
Pandas
df.query('age > 25')
Filters rows where the city column equals 'New York'.
Pandas
df.query('city == "New York"')
Filters rows where age is over 20 and city is 'Chicago'.
Pandas
df.query('age > 20 and city == "Chicago"')
Selects rows where score is at least 80 or grade is 'A'.
Pandas
df.query('score >= 80 or grade == "A"')
Sample Program

This code creates a small table of people with their ages and cities. It then uses query() to find people older than 30, and separately, people in New York who are younger than 30.

Pandas
import pandas as pd

data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'age': [25, 32, 18, 47],
    'city': ['New York', 'Chicago', 'New York', 'Chicago']
}
df = pd.DataFrame(data)

# Filter people older than 30
older_than_30 = df.query('age > 30')
print(older_than_30)

# Filter people in New York younger than 30
ny_young = df.query('city == "New York" and age < 30')
print(ny_young)
OutputSuccess
Important Notes

Use double quotes inside the query string if your values are strings, like city == "New York".

query() can be faster than normal filtering with brackets for large data.

Column names with spaces or special characters need backticks, e.g., df.query('`my column` > 5').

Summary

query() lets you filter rows using simple, readable strings.

It works well for conditions with one or more columns combined with and / or.

It can make your code cleaner and sometimes faster for big tables.