0
0
PandasHow-ToBeginner · 3 min read

How to Create a Column from Condition in pandas DataFrame

In pandas, you can create a new column based on a condition using df['new_column'] = condition with np.where() or boolean indexing. This lets you assign values depending on whether each row meets the condition.
📐

Syntax

Use np.where(condition, value_if_true, value_if_false) to create a new column based on a condition. Alternatively, use boolean indexing with df.loc[condition, 'new_column'] = value.

  • condition: A boolean expression applied to DataFrame rows.
  • value_if_true: Value assigned if condition is True.
  • value_if_false: Value assigned if condition is False.
python
import numpy as np
import pandas as pd

# Example usage:
# df['new_column'] = np.where(condition, value_if_true, value_if_false)
💻

Example

This example shows how to create a new column 'Status' that labels rows as 'Adult' if the 'Age' is 18 or more, otherwise 'Minor'.

python
import pandas as pd
import numpy as np

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [17, 20, 15, 22]}
df = pd.DataFrame(data)

df['Status'] = np.where(df['Age'] >= 18, 'Adult', 'Minor')
print(df)
Output
Name Age Status 0 Alice 17 Minor 1 Bob 20 Adult 2 Charlie 15 Minor 3 David 22 Adult
⚠️

Common Pitfalls

Common mistakes include:

  • Using assignment without np.where or loc, which can cause errors or unexpected results.
  • Forgetting to import numpy when using np.where.
  • Using chained assignment which may not update the DataFrame correctly.
python
import pandas as pd

data = {'Age': [16, 21]}
df = pd.DataFrame(data)

# Wrong: This does not create a new column properly
# df['Status'][df['Age'] >= 18] = 'Adult'

# Right: Use loc for conditional assignment

df.loc[df['Age'] >= 18, 'Status'] = 'Adult'
df.loc[df['Age'] < 18, 'Status'] = 'Minor'

print(df)
Output
Age Status 0 16 Minor 1 21 Adult
📊

Quick Reference

MethodDescriptionExample
np.whereCreate column with condition in one linedf['new_col'] = np.where(df['col'] > 0, 'Yes', 'No')
loc assignmentAssign values conditionally using locdf.loc[df['col'] > 0, 'new_col'] = 'Yes'
Boolean indexingAssign values directly with boolean maskdf['new_col'] = 'No'; df.loc[df['col'] > 0, 'new_col'] = 'Yes'

Key Takeaways

Use np.where(condition, value_if_true, value_if_false) to create a new column based on a condition.
Boolean indexing with df.loc is a safe way to assign values conditionally.
Always import numpy as np when using np.where.
Avoid chained assignment to prevent unexpected DataFrame updates.
Test your condition logic on a small DataFrame to ensure correctness.