How to Use where in pandas: Filter Data Conditionally
In pandas, use the
where() method to keep values where a condition is True and replace others with NaN or a specified value. It helps filter data conditionally without dropping rows, unlike loc or boolean indexing.Syntax
The where() method syntax is:
DataFrame.where(cond, other=np.nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
cond: A condition (boolean mask) to keep values where True.
other: Value to replace where cond is False (default is NaN).
inplace: If True, modifies the original DataFrame.
python
df.where(cond, other=np.nan)
Example
This example shows how to keep values greater than 50 and replace others with NaN using where().
python
import pandas as pd import numpy as np data = {'score': [45, 67, 89, 34, 56]} df = pd.DataFrame(data) # Keep scores > 50, replace others with NaN df_filtered = df.where(df['score'] > 50) print(df_filtered)
Output
score
0 NaN
1 67.0
2 89.0
3 NaN
4 56.0
Common Pitfalls
One common mistake is confusing where() with boolean indexing. where() keeps original shape and replaces values where condition is False, while boolean indexing filters rows out.
Also, forgetting to import numpy for np.nan can cause errors.
python
import pandas as pd import numpy as np data = {'score': [45, 67, 89, 34, 56]} df = pd.DataFrame(data) # Wrong: boolean indexing filters rows filtered_wrong = df[df['score'] > 50] # Right: where keeps all rows, replaces values filtered_right = df.where(df['score'] > 50) print('Boolean indexing result:\n', filtered_wrong) print('\nWhere method result:\n', filtered_right)
Output
Boolean indexing result:
score
1 67
2 89
4 56
Where method result:
score
0 NaN
1 67.0
2 89.0
3 NaN
4 56.0
Quick Reference
| Parameter | Description |
|---|---|
| cond | Boolean condition to keep values where True |
| other | Value to replace where condition is False (default NaN) |
| inplace | Modify original DataFrame if True (default False) |
| axis | Axis to apply condition (optional) |
| errors | 'raise' or 'ignore' for errors |
| try_cast | Try to cast result to original dtype |
Key Takeaways
Use
where() to keep values where a condition is True and replace others without dropping rows.The default replacement for False conditions is NaN, but you can specify any value with the
other parameter.where() keeps the original DataFrame shape, unlike boolean indexing which filters rows out.Remember to import
numpy to use np.nan as the replacement value.Use
inplace=True to modify the original DataFrame directly.