0
0
PandasHow-ToBeginner · 3 min read

How to Use Boolean Indexing in pandas for Data Selection

Use boolean indexing in pandas by creating a condition that returns True or False for each row or element, then pass this condition inside the DataFrame or Series brackets to filter data. For example, df[df['column'] > 5] returns rows where the column value is greater than 5.
📐

Syntax

Boolean indexing uses a condition that returns a series of True or False values. This series is then used to select rows or elements from a DataFrame or Series.

  • df[condition]: Select rows where condition is True.
  • condition: A comparison or logical expression that returns a boolean Series.
python
df[condition]

# Example condition:
df['column'] > 5
💻

Example

This example shows how to filter a DataFrame to get rows where the 'Age' column is greater than 30.

python
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 35, 30, 40],
        'City': ['NY', 'LA', 'NY', 'Chicago']}
df = pd.DataFrame(data)

# Boolean condition to select rows where Age > 30
filtered_df = df[df['Age'] > 30]
print(filtered_df)
Output
Name Age City 1 Bob 35 LA 3 David 40 Chicago
⚠️

Common Pitfalls

Common mistakes include:

  • Using a single equals sign = instead of double equals == for comparison.
  • Forgetting that the condition must return a boolean Series matching the DataFrame's index.
  • Trying to use boolean indexing on a DataFrame without a proper condition.

Always use == for equality checks and ensure the condition matches the DataFrame's shape.

python
import pandas as pd

data = {'A': [1, 2, 3]}
df = pd.DataFrame(data)

# Wrong: using = instead of ==
# filtered = df[df['A'] = 2]  # This will cause a syntax error

# Right:
filtered = df[df['A'] == 2]
print(filtered)
Output
A 1 2
📊

Quick Reference

OperationExampleDescription
Greater thandf[df['col'] > 5]Select rows where 'col' is greater than 5
Equal todf[df['col'] == 10]Select rows where 'col' equals 10
Multiple conditionsdf[(df['col1'] > 5) & (df['col2'] == 'A')]Select rows matching both conditions
Not equaldf[df['col'] != 3]Select rows where 'col' is not 3
Using isindf[df['col'].isin([1,2,3])]Select rows where 'col' is in the list

Key Takeaways

Boolean indexing filters data by using conditions that return True or False for each row.
Always use double equals (==) for equality checks in conditions.
Combine multiple conditions with & (and) or | (or), using parentheses around each condition.
Boolean indexing returns a subset of the DataFrame or Series matching the condition.
Ensure the condition returns a boolean Series aligned with the DataFrame's index.