0
0
PandasHow-ToBeginner · 3 min read

How to Use np.where with pandas for Conditional Selection

Use np.where(condition, value_if_true, value_if_false) with pandas columns to create new columns or modify data based on conditions. It works like an if-else statement applied element-wise on pandas Series or DataFrame columns.
📐

Syntax

The basic syntax of np.where is:

  • condition: A boolean expression applied to pandas columns.
  • value_if_true: The value assigned if the condition is true.
  • value_if_false: The value assigned if the condition is false.

This returns a new array or Series with values chosen based on the condition.

python
np.where(condition, value_if_true, value_if_false)
💻

Example

This example shows how to create a new column in a pandas DataFrame using np.where. We check if values in the 'score' column are greater than or equal to 60, then assign 'Pass' or 'Fail' accordingly.

python
import pandas as pd
import numpy as np

data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
        'score': [85, 42, 73, 58]}
df = pd.DataFrame(data)

df['result'] = np.where(df['score'] >= 60, 'Pass', 'Fail')
print(df)
Output
name score result 0 Alice 85 Pass 1 Bob 42 Fail 2 Charlie 73 Pass 3 David 58 Fail
⚠️

Common Pitfalls

Common mistakes when using np.where with pandas include:

  • Using a condition that is not a boolean Series, which causes errors.
  • Passing scalar values instead of arrays or Series for the condition.
  • Forgetting to assign the result back to a DataFrame column.

Always ensure the condition is a boolean Series matching the DataFrame's index.

python
import pandas as pd
import numpy as np

data = {'score': [70, 50, 90]}
df = pd.DataFrame(data)

# Wrong: condition is a scalar, not a Series
# df['result'] = np.where(70 > 60, 'Pass', 'Fail')  # This assigns 'Pass' to all rows

# Correct: condition is a Series
condition = df['score'] > 60
df['result'] = np.where(condition, 'Pass', 'Fail')
print(df)
Output
score result 0 70 Pass 1 50 Fail 2 90 Pass
📊

Quick Reference

ParameterDescription
conditionBoolean expression applied to pandas Series or DataFrame columns
value_if_trueValue assigned where condition is True
value_if_falseValue assigned where condition is False

Key Takeaways

Use np.where with a boolean condition on pandas columns to apply element-wise if-else logic.
Always ensure the condition is a pandas Series of booleans matching the DataFrame index.
Assign the output of np.where back to a DataFrame column to save the result.
np.where is efficient for creating new columns based on conditions without loops.