0
0
MlopsHow-ToBeginner · 3 min read

How to Normalize Data in Python Using sklearn

To normalize data in Python, use sklearn.preprocessing.Normalizer which scales each sample to have unit norm. Fit the normalizer on your data and transform it with fit_transform() or transform() methods.
📐

Syntax

The Normalizer class from sklearn.preprocessing is used to normalize data. You create an instance with an optional norm parameter (default is 'l2'). Then use fit_transform() to fit and normalize data in one step or transform() if already fitted.

  • Normalizer(norm='l2'): Creates a normalizer object.
  • fit_transform(X): Fits and normalizes the data X.
  • transform(X): Normalizes new data using the fitted normalizer.
python
from sklearn.preprocessing import Normalizer

normalizer = Normalizer(norm='l2')
X_normalized = normalizer.fit_transform(X)
💻

Example

This example shows how to normalize a small dataset so each row has length 1 using L2 norm.

python
from sklearn.preprocessing import Normalizer
import numpy as np

# Sample data: 3 samples with 3 features each
X = np.array([[4, 1, 2],
              [1, 3, 9],
              [5, 7, 2]])

normalizer = Normalizer(norm='l2')
X_normalized = normalizer.fit_transform(X)

print("Original data:\n", X)
print("\nNormalized data (each row length=1):\n", X_normalized)
Output
Original data: [[4 1 2] [1 3 9] [5 7 2]] Normalized data (each row length=1): [[0.87287156 0.21821789 0.43643578] [0.10540926 0.31622777 0.9486833 ] [0.58123819 0.81373347 0.23249528]]
⚠️

Common Pitfalls

Common mistakes include confusing normalization with standardization, or applying normalization across columns instead of rows. Normalizer scales each sample (row) independently to unit norm, not features (columns). For feature-wise scaling, use StandardScaler or MinMaxScaler.

Also, normalizing sparse data without care can lead to dense output and high memory use.

python
from sklearn.preprocessing import Normalizer, StandardScaler
import numpy as np

X = np.array([[4, 1, 2], [1, 3, 9], [5, 7, 2]])

# Wrong: Normalizing columns (features) by transposing
normalizer = Normalizer()
X_wrong = normalizer.fit_transform(X.T).T

# Right: Normalize rows (samples)
X_right = normalizer.fit_transform(X)

print("Wrong normalization (columns):\n", X_wrong)
print("\nRight normalization (rows):\n", X_right)
Output
Wrong normalization (columns): [[0.62469505 0.26726124 0.37139068] [0.15617376 0.80178373 0.92847669] [0.31234752 0.53452248 0.18569534]] Right normalization (rows): [[0.87287156 0.21821789 0.43643578] [0.10540926 0.31622777 0.9486833 ] [0.58123819 0.81373347 0.23249528]]
📊

Quick Reference

MethodPurposeNotes
Normalizer(norm='l2')Normalize each sample to unit lengthUse for row-wise normalization
StandardScaler()Standardize features by removing mean and scaling to unit varianceFeature-wise scaling
MinMaxScaler()Scale features to a given range (default 0-1)Feature-wise scaling
fit_transform(X)Fit to data and transform itUse on training data
transform(X)Transform new data using fitted scalerUse on test or new data

Key Takeaways

Use sklearn.preprocessing.Normalizer to scale each sample (row) to unit norm.
Normalization scales rows independently, unlike standardization which scales features.
Always apply normalization on training data with fit_transform, then transform new data.
Avoid normalizing columns by mistake; Normalizer works row-wise.
For feature scaling, consider StandardScaler or MinMaxScaler instead.