0
0
PandasHow-ToBeginner · 3 min read

How to Use where in pandas: Filter Data Conditionally

In pandas, use the where() method to keep values where a condition is True and replace others with NaN or a specified value. It helps filter data conditionally without dropping rows, unlike loc or boolean indexing.
📐

Syntax

The where() method syntax is:

  • DataFrame.where(cond, other=np.nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)

cond: A condition (boolean mask) to keep values where True.

other: Value to replace where cond is False (default is NaN).

inplace: If True, modifies the original DataFrame.

python
df.where(cond, other=np.nan)
💻

Example

This example shows how to keep values greater than 50 and replace others with NaN using where().

python
import pandas as pd
import numpy as np

data = {'score': [45, 67, 89, 34, 56]}
df = pd.DataFrame(data)

# Keep scores > 50, replace others with NaN
df_filtered = df.where(df['score'] > 50)
print(df_filtered)
Output
score 0 NaN 1 67.0 2 89.0 3 NaN 4 56.0
⚠️

Common Pitfalls

One common mistake is confusing where() with boolean indexing. where() keeps original shape and replaces values where condition is False, while boolean indexing filters rows out.

Also, forgetting to import numpy for np.nan can cause errors.

python
import pandas as pd
import numpy as np

data = {'score': [45, 67, 89, 34, 56]}
df = pd.DataFrame(data)

# Wrong: boolean indexing filters rows
filtered_wrong = df[df['score'] > 50]

# Right: where keeps all rows, replaces values
filtered_right = df.where(df['score'] > 50)

print('Boolean indexing result:\n', filtered_wrong)
print('\nWhere method result:\n', filtered_right)
Output
Boolean indexing result: score 1 67 2 89 4 56 Where method result: score 0 NaN 1 67.0 2 89.0 3 NaN 4 56.0
📊

Quick Reference

ParameterDescription
condBoolean condition to keep values where True
otherValue to replace where condition is False (default NaN)
inplaceModify original DataFrame if True (default False)
axisAxis to apply condition (optional)
errors'raise' or 'ignore' for errors
try_castTry to cast result to original dtype

Key Takeaways

Use where() to keep values where a condition is True and replace others without dropping rows.
The default replacement for False conditions is NaN, but you can specify any value with the other parameter.
where() keeps the original DataFrame shape, unlike boolean indexing which filters rows out.
Remember to import numpy to use np.nan as the replacement value.
Use inplace=True to modify the original DataFrame directly.