0
0
PandasHow-ToBeginner · 3 min read

How to Use query() in pandas for Data Filtering

Use the query() method in pandas DataFrame to filter rows by writing a condition as a string. It lets you select rows where the condition is true, using column names directly inside the string.
📐

Syntax

The basic syntax of query() is:

  • DataFrame.query(expr, inplace=False, **kwargs)
  • expr: A string expression to filter rows, using column names as variables.
  • inplace: If True, modifies the DataFrame in place; otherwise returns a filtered copy.
python
filtered_df = df.query('column_name operator value')
💻

Example

This example shows how to filter rows where the age column is greater than 30.

python
import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
        'age': [25, 35, 30, 40],
        'city': ['NY', 'LA', 'NY', 'Chicago']}
df = pd.DataFrame(data)

filtered_df = df.query('age > 30')
print(filtered_df)
Output
name age city 1 Bob 35 LA 3 David 40 Chicago
⚠️

Common Pitfalls

Common mistakes when using query() include:

  • Using column names with spaces or special characters without backticks.
  • Trying to use Python variables directly inside the query string without passing them.
  • Confusing string quotes inside the query expression.

Use backticks for column names with spaces, and pass variables with @ prefix.

python
import pandas as pd

# Wrong: column name with space without backticks
# df.query('total price > 100')  # This will raise an error

# Correct: use backticks for column names with spaces
# df.query('`total price` > 100')

# Using Python variable inside query
threshold = 30
filtered = df.query('age > @threshold')
print(filtered)
Output
name age city 1 Bob 35 LA 3 David 40 Chicago
📊

Quick Reference

FeatureDescriptionExample
Basic filterFilter rows by conditiondf.query('age > 30')
Use variablesUse Python variables with @df.query('age > @min_age')
Columns with spacesUse backticks around namesdf.query('`total price` > 100')
Inplace filteringModify original DataFramedf.query('age > 30', inplace=True)

Key Takeaways

Use df.query('condition') to filter DataFrame rows by column conditions.
Refer to columns directly by name inside the query string.
Use backticks for column names with spaces or special characters.
Pass Python variables inside query with @ prefix.
query() returns a filtered copy unless inplace=True is set.