How to Use Boolean Indexing in pandas for Data Selection
Use
boolean indexing in pandas by creating a condition that returns True or False for each row or element, then pass this condition inside the DataFrame or Series brackets to filter data. For example, df[df['column'] > 5] returns rows where the column value is greater than 5.Syntax
Boolean indexing uses a condition that returns a series of True or False values. This series is then used to select rows or elements from a DataFrame or Series.
df[condition]: Select rows whereconditionisTrue.condition: A comparison or logical expression that returns a boolean Series.
python
df[condition] # Example condition: df['column'] > 5
Example
This example shows how to filter a DataFrame to get rows where the 'Age' column is greater than 30.
python
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 35, 30, 40], 'City': ['NY', 'LA', 'NY', 'Chicago']} df = pd.DataFrame(data) # Boolean condition to select rows where Age > 30 filtered_df = df[df['Age'] > 30] print(filtered_df)
Output
Name Age City
1 Bob 35 LA
3 David 40 Chicago
Common Pitfalls
Common mistakes include:
- Using a single equals sign
=instead of double equals==for comparison. - Forgetting that the condition must return a boolean Series matching the DataFrame's index.
- Trying to use boolean indexing on a DataFrame without a proper condition.
Always use == for equality checks and ensure the condition matches the DataFrame's shape.
python
import pandas as pd data = {'A': [1, 2, 3]} df = pd.DataFrame(data) # Wrong: using = instead of == # filtered = df[df['A'] = 2] # This will cause a syntax error # Right: filtered = df[df['A'] == 2] print(filtered)
Output
A
1 2
Quick Reference
| Operation | Example | Description |
|---|---|---|
| Greater than | df[df['col'] > 5] | Select rows where 'col' is greater than 5 |
| Equal to | df[df['col'] == 10] | Select rows where 'col' equals 10 |
| Multiple conditions | df[(df['col1'] > 5) & (df['col2'] == 'A')] | Select rows matching both conditions |
| Not equal | df[df['col'] != 3] | Select rows where 'col' is not 3 |
| Using isin | df[df['col'].isin([1,2,3])] | Select rows where 'col' is in the list |
Key Takeaways
Boolean indexing filters data by using conditions that return True or False for each row.
Always use double equals (==) for equality checks in conditions.
Combine multiple conditions with & (and) or | (or), using parentheses around each condition.
Boolean indexing returns a subset of the DataFrame or Series matching the condition.
Ensure the condition returns a boolean Series aligned with the DataFrame's index.