0
0
PandasHow-ToBeginner · 3 min read

How to Normalize a Column in Pandas: Simple Guide

To normalize a column in pandas, use min-max scaling with (x - x.min()) / (x.max() - x.min()) or z-score standardization with (x - x.mean()) / x.std(). Apply these formulas directly on the DataFrame column to scale values between 0 and 1 or to have zero mean and unit variance.
📐

Syntax

Normalization in pandas can be done using simple formulas applied to a DataFrame column.

  • Min-Max Scaling: df['col'] = (df['col'] - df['col'].min()) / (df['col'].max() - df['col'].min()) scales values between 0 and 1.
  • Z-Score Standardization: df['col'] = (df['col'] - df['col'].mean()) / df['col'].std() centers data around 0 with standard deviation 1.
python
df['normalized'] = (df['column'] - df['column'].min()) / (df['column'].max() - df['column'].min())
df['standardized'] = (df['column'] - df['column'].mean()) / df['column'].std()
💻

Example

This example shows how to normalize a column using min-max scaling and z-score standardization in pandas.

python
import pandas as pd

data = {'score': [50, 80, 90, 100, 60]}
df = pd.DataFrame(data)

# Min-Max Normalization
df['min_max_norm'] = (df['score'] - df['score'].min()) / (df['score'].max() - df['score'].min())

# Z-Score Standardization
df['z_score_norm'] = (df['score'] - df['score'].mean()) / df['score'].std()

print(df)
Output
score min_max_norm z_score_norm 0 50 0.00 -1.414214 1 80 0.75 0.000000 2 90 0.88 0.707107 3 100 1.00 1.414214 4 60 0.25 -0.707107
⚠️

Common Pitfalls

Common mistakes when normalizing columns include:

  • Not handling missing values before normalization, which can cause errors.
  • Normalizing categorical or non-numeric columns by mistake.
  • Forgetting to apply normalization on the correct column or overwriting original data unintentionally.

Always check data types and clean data before normalizing.

python
import pandas as pd

data = {'score': [50, None, 90, 100, 60]}
df = pd.DataFrame(data)

# Wrong: Normalizing without handling NaN
# df['norm'] = (df['score'] - df['score'].min()) / (df['score'].max() - df['score'].min())  # This will work but NaN stays

# Right: Fill NaN before normalization
df['score_filled'] = df['score'].fillna(df['score'].mean())
df['norm'] = (df['score_filled'] - df['score_filled'].min()) / (df['score_filled'].max() - df['score_filled'].min())

print(df)
Output
score score_filled norm 0 50.0 50.000000 0.000000 1 NaN 75.000000 0.625000 2 90.0 90.000000 0.833333 3 100.0 100.000000 1.000000 4 60.0 60.000000 0.166667
📊

Quick Reference

Summary tips for normalizing columns in pandas:

  • Use min-max scaling to scale values between 0 and 1.
  • Use z-score standardization to center data with mean 0 and std 1.
  • Handle missing values before normalization.
  • Only normalize numeric columns.
  • Keep original data if you need to compare later.

Key Takeaways

Normalize pandas columns using min-max scaling or z-score standardization formulas.
Always handle missing values before normalizing to avoid errors.
Normalize only numeric columns to get meaningful results.
Keep a copy of original data if you want to compare before and after normalization.
Min-max scales data between 0 and 1; z-score centers data around zero with unit variance.