How to Normalize a Column in Pandas: Simple Guide
To normalize a column in pandas, use min-max scaling with
(x - x.min()) / (x.max() - x.min()) or z-score standardization with (x - x.mean()) / x.std(). Apply these formulas directly on the DataFrame column to scale values between 0 and 1 or to have zero mean and unit variance.Syntax
Normalization in pandas can be done using simple formulas applied to a DataFrame column.
- Min-Max Scaling:
df['col'] = (df['col'] - df['col'].min()) / (df['col'].max() - df['col'].min())scales values between 0 and 1. - Z-Score Standardization:
df['col'] = (df['col'] - df['col'].mean()) / df['col'].std()centers data around 0 with standard deviation 1.
python
df['normalized'] = (df['column'] - df['column'].min()) / (df['column'].max() - df['column'].min()) df['standardized'] = (df['column'] - df['column'].mean()) / df['column'].std()
Example
This example shows how to normalize a column using min-max scaling and z-score standardization in pandas.
python
import pandas as pd data = {'score': [50, 80, 90, 100, 60]} df = pd.DataFrame(data) # Min-Max Normalization df['min_max_norm'] = (df['score'] - df['score'].min()) / (df['score'].max() - df['score'].min()) # Z-Score Standardization df['z_score_norm'] = (df['score'] - df['score'].mean()) / df['score'].std() print(df)
Output
score min_max_norm z_score_norm
0 50 0.00 -1.414214
1 80 0.75 0.000000
2 90 0.88 0.707107
3 100 1.00 1.414214
4 60 0.25 -0.707107
Common Pitfalls
Common mistakes when normalizing columns include:
- Not handling missing values before normalization, which can cause errors.
- Normalizing categorical or non-numeric columns by mistake.
- Forgetting to apply normalization on the correct column or overwriting original data unintentionally.
Always check data types and clean data before normalizing.
python
import pandas as pd data = {'score': [50, None, 90, 100, 60]} df = pd.DataFrame(data) # Wrong: Normalizing without handling NaN # df['norm'] = (df['score'] - df['score'].min()) / (df['score'].max() - df['score'].min()) # This will work but NaN stays # Right: Fill NaN before normalization df['score_filled'] = df['score'].fillna(df['score'].mean()) df['norm'] = (df['score_filled'] - df['score_filled'].min()) / (df['score_filled'].max() - df['score_filled'].min()) print(df)
Output
score score_filled norm
0 50.0 50.000000 0.000000
1 NaN 75.000000 0.625000
2 90.0 90.000000 0.833333
3 100.0 100.000000 1.000000
4 60.0 60.000000 0.166667
Quick Reference
Summary tips for normalizing columns in pandas:
- Use min-max scaling to scale values between 0 and 1.
- Use z-score standardization to center data with mean 0 and std 1.
- Handle missing values before normalization.
- Only normalize numeric columns.
- Keep original data if you need to compare later.
Key Takeaways
Normalize pandas columns using min-max scaling or z-score standardization formulas.
Always handle missing values before normalizing to avoid errors.
Normalize only numeric columns to get meaningful results.
Keep a copy of original data if you want to compare before and after normalization.
Min-max scales data between 0 and 1; z-score centers data around zero with unit variance.