How to normalize column pandas

PandasHow-ToBeginner · 3 min read

How to Normalize a Column in Pandas: Simple Guide

To normalize a column in pandas, use min-max scaling with (x - x.min()) / (x.max() - x.min()) or z-score standardization with (x - x.mean()) / x.std(). Apply these formulas directly on the DataFrame column to scale values between 0 and 1 or to have zero mean and unit variance.

📐

Syntax

Normalization in pandas can be done using simple formulas applied to a DataFrame column.

Min-Max Scaling: df['col'] = (df['col'] - df['col'].min()) / (df['col'].max() - df['col'].min()) scales values between 0 and 1.
Z-Score Standardization: df['col'] = (df['col'] - df['col'].mean()) / df['col'].std() centers data around 0 with standard deviation 1.

python

df['normalized'] = (df['column'] - df['column'].min()) / (df['column'].max() - df['column'].min())
df['standardized'] = (df['column'] - df['column'].mean()) / df['column'].std()

💻

Example

This example shows how to normalize a column using min-max scaling and z-score standardization in pandas.

python

import pandas as pd

data = {'score': [50, 80, 90, 100, 60]}
df = pd.DataFrame(data)

# Min-Max Normalization
df['min_max_norm'] = (df['score'] - df['score'].min()) / (df['score'].max() - df['score'].min())

# Z-Score Standardization
df['z_score_norm'] = (df['score'] - df['score'].mean()) / df['score'].std()

print(df)

Output

score min_max_norm z_score_norm 0 50 0.00 -1.414214 1 80 0.75 0.000000 2 90 0.88 0.707107 3 100 1.00 1.414214 4 60 0.25 -0.707107

⚠️

Common Pitfalls

Common mistakes when normalizing columns include:

Not handling missing values before normalization, which can cause errors.
Normalizing categorical or non-numeric columns by mistake.
Forgetting to apply normalization on the correct column or overwriting original data unintentionally.

Always check data types and clean data before normalizing.

python

import pandas as pd

data = {'score': [50, None, 90, 100, 60]}
df = pd.DataFrame(data)

# Wrong: Normalizing without handling NaN
# df['norm'] = (df['score'] - df['score'].min()) / (df['score'].max() - df['score'].min())  # This will work but NaN stays

# Right: Fill NaN before normalization
df['score_filled'] = df['score'].fillna(df['score'].mean())
df['norm'] = (df['score_filled'] - df['score_filled'].min()) / (df['score_filled'].max() - df['score_filled'].min())

print(df)

Output

score score_filled norm 0 50.0 50.000000 0.000000 1 NaN 75.000000 0.625000 2 90.0 90.000000 0.833333 3 100.0 100.000000 1.000000 4 60.0 60.000000 0.166667

📊

Quick Reference

Summary tips for normalizing columns in pandas:

Use min-max scaling to scale values between 0 and 1.
Use z-score standardization to center data with mean 0 and std 1.
Handle missing values before normalization.
Only normalize numeric columns.
Keep original data if you need to compare later.

✅

Key Takeaways

Normalize pandas columns using min-max scaling or z-score standardization formulas.

Always handle missing values before normalizing to avoid errors.

Normalize only numeric columns to get meaningful results.

Keep a copy of original data if you want to compare before and after normalization.

Min-max scales data between 0 and 1; z-score centers data around zero with unit variance.