How to Use query() in pandas for Data Filtering
Use the
query() method in pandas DataFrame to filter rows by writing a condition as a string. It lets you select rows where the condition is true, using column names directly inside the string.Syntax
The basic syntax of query() is:
DataFrame.query(expr, inplace=False, **kwargs)expr: A string expression to filter rows, using column names as variables.inplace: If True, modifies the DataFrame in place; otherwise returns a filtered copy.
python
filtered_df = df.query('column_name operator value')Example
This example shows how to filter rows where the age column is greater than 30.
python
import pandas as pd data = {'name': ['Alice', 'Bob', 'Charlie', 'David'], 'age': [25, 35, 30, 40], 'city': ['NY', 'LA', 'NY', 'Chicago']} df = pd.DataFrame(data) filtered_df = df.query('age > 30') print(filtered_df)
Output
name age city
1 Bob 35 LA
3 David 40 Chicago
Common Pitfalls
Common mistakes when using query() include:
- Using column names with spaces or special characters without backticks.
- Trying to use Python variables directly inside the query string without passing them.
- Confusing string quotes inside the query expression.
Use backticks for column names with spaces, and pass variables with @ prefix.
python
import pandas as pd # Wrong: column name with space without backticks # df.query('total price > 100') # This will raise an error # Correct: use backticks for column names with spaces # df.query('`total price` > 100') # Using Python variable inside query threshold = 30 filtered = df.query('age > @threshold') print(filtered)
Output
name age city
1 Bob 35 LA
3 David 40 Chicago
Quick Reference
| Feature | Description | Example |
|---|---|---|
| Basic filter | Filter rows by condition | df.query('age > 30') |
| Use variables | Use Python variables with @ | df.query('age > @min_age') |
| Columns with spaces | Use backticks around names | df.query('`total price` > 100') |
| Inplace filtering | Modify original DataFrame | df.query('age > 30', inplace=True) |
Key Takeaways
Use df.query('condition') to filter DataFrame rows by column conditions.
Refer to columns directly by name inside the query string.
Use backticks for column names with spaces or special characters.
Pass Python variables inside query with @ prefix.
query() returns a filtered copy unless inplace=True is set.