How to Create a Column from Condition in pandas DataFrame
In pandas, you can create a new column based on a condition using
df['new_column'] = condition with np.where() or boolean indexing. This lets you assign values depending on whether each row meets the condition.Syntax
Use np.where(condition, value_if_true, value_if_false) to create a new column based on a condition. Alternatively, use boolean indexing with df.loc[condition, 'new_column'] = value.
condition: A boolean expression applied to DataFrame rows.value_if_true: Value assigned if condition is True.value_if_false: Value assigned if condition is False.
python
import numpy as np import pandas as pd # Example usage: # df['new_column'] = np.where(condition, value_if_true, value_if_false)
Example
This example shows how to create a new column 'Status' that labels rows as 'Adult' if the 'Age' is 18 or more, otherwise 'Minor'.
python
import pandas as pd import numpy as np data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [17, 20, 15, 22]} df = pd.DataFrame(data) df['Status'] = np.where(df['Age'] >= 18, 'Adult', 'Minor') print(df)
Output
Name Age Status
0 Alice 17 Minor
1 Bob 20 Adult
2 Charlie 15 Minor
3 David 22 Adult
Common Pitfalls
Common mistakes include:
- Using assignment without
np.whereorloc, which can cause errors or unexpected results. - Forgetting to import
numpywhen usingnp.where. - Using chained assignment which may not update the DataFrame correctly.
python
import pandas as pd data = {'Age': [16, 21]} df = pd.DataFrame(data) # Wrong: This does not create a new column properly # df['Status'][df['Age'] >= 18] = 'Adult' # Right: Use loc for conditional assignment df.loc[df['Age'] >= 18, 'Status'] = 'Adult' df.loc[df['Age'] < 18, 'Status'] = 'Minor' print(df)
Output
Age Status
0 16 Minor
1 21 Adult
Quick Reference
| Method | Description | Example |
|---|---|---|
| np.where | Create column with condition in one line | df['new_col'] = np.where(df['col'] > 0, 'Yes', 'No') |
| loc assignment | Assign values conditionally using loc | df.loc[df['col'] > 0, 'new_col'] = 'Yes' |
| Boolean indexing | Assign values directly with boolean mask | df['new_col'] = 'No'; df.loc[df['col'] > 0, 'new_col'] = 'Yes' |
Key Takeaways
Use np.where(condition, value_if_true, value_if_false) to create a new column based on a condition.
Boolean indexing with df.loc is a safe way to assign values conditionally.
Always import numpy as np when using np.where.
Avoid chained assignment to prevent unexpected DataFrame updates.
Test your condition logic on a small DataFrame to ensure correctness.