How to Use np.where with pandas for Conditional Selection
Use
np.where(condition, value_if_true, value_if_false) with pandas columns to create new columns or modify data based on conditions. It works like an if-else statement applied element-wise on pandas Series or DataFrame columns.Syntax
The basic syntax of np.where is:
condition: A boolean expression applied to pandas columns.value_if_true: The value assigned if the condition is true.value_if_false: The value assigned if the condition is false.
This returns a new array or Series with values chosen based on the condition.
python
np.where(condition, value_if_true, value_if_false)
Example
This example shows how to create a new column in a pandas DataFrame using np.where. We check if values in the 'score' column are greater than or equal to 60, then assign 'Pass' or 'Fail' accordingly.
python
import pandas as pd import numpy as np data = {'name': ['Alice', 'Bob', 'Charlie', 'David'], 'score': [85, 42, 73, 58]} df = pd.DataFrame(data) df['result'] = np.where(df['score'] >= 60, 'Pass', 'Fail') print(df)
Output
name score result
0 Alice 85 Pass
1 Bob 42 Fail
2 Charlie 73 Pass
3 David 58 Fail
Common Pitfalls
Common mistakes when using np.where with pandas include:
- Using a condition that is not a boolean Series, which causes errors.
- Passing scalar values instead of arrays or Series for the condition.
- Forgetting to assign the result back to a DataFrame column.
Always ensure the condition is a boolean Series matching the DataFrame's index.
python
import pandas as pd import numpy as np data = {'score': [70, 50, 90]} df = pd.DataFrame(data) # Wrong: condition is a scalar, not a Series # df['result'] = np.where(70 > 60, 'Pass', 'Fail') # This assigns 'Pass' to all rows # Correct: condition is a Series condition = df['score'] > 60 df['result'] = np.where(condition, 'Pass', 'Fail') print(df)
Output
score result
0 70 Pass
1 50 Fail
2 90 Pass
Quick Reference
| Parameter | Description |
|---|---|
| condition | Boolean expression applied to pandas Series or DataFrame columns |
| value_if_true | Value assigned where condition is True |
| value_if_false | Value assigned where condition is False |
Key Takeaways
Use np.where with a boolean condition on pandas columns to apply element-wise if-else logic.
Always ensure the condition is a pandas Series of booleans matching the DataFrame index.
Assign the output of np.where back to a DataFrame column to save the result.
np.where is efficient for creating new columns based on conditions without loops.