How to Normalize Data in Python Using sklearn
To normalize data in Python, use
sklearn.preprocessing.Normalizer which scales each sample to have unit norm. Fit the normalizer on your data and transform it with fit_transform() or transform() methods.Syntax
The Normalizer class from sklearn.preprocessing is used to normalize data. You create an instance with an optional norm parameter (default is 'l2'). Then use fit_transform() to fit and normalize data in one step or transform() if already fitted.
Normalizer(norm='l2'): Creates a normalizer object.fit_transform(X): Fits and normalizes the dataX.transform(X): Normalizes new data using the fitted normalizer.
python
from sklearn.preprocessing import Normalizer normalizer = Normalizer(norm='l2') X_normalized = normalizer.fit_transform(X)
Example
This example shows how to normalize a small dataset so each row has length 1 using L2 norm.
python
from sklearn.preprocessing import Normalizer import numpy as np # Sample data: 3 samples with 3 features each X = np.array([[4, 1, 2], [1, 3, 9], [5, 7, 2]]) normalizer = Normalizer(norm='l2') X_normalized = normalizer.fit_transform(X) print("Original data:\n", X) print("\nNormalized data (each row length=1):\n", X_normalized)
Output
Original data:
[[4 1 2]
[1 3 9]
[5 7 2]]
Normalized data (each row length=1):
[[0.87287156 0.21821789 0.43643578]
[0.10540926 0.31622777 0.9486833 ]
[0.58123819 0.81373347 0.23249528]]
Common Pitfalls
Common mistakes include confusing normalization with standardization, or applying normalization across columns instead of rows. Normalizer scales each sample (row) independently to unit norm, not features (columns). For feature-wise scaling, use StandardScaler or MinMaxScaler.
Also, normalizing sparse data without care can lead to dense output and high memory use.
python
from sklearn.preprocessing import Normalizer, StandardScaler import numpy as np X = np.array([[4, 1, 2], [1, 3, 9], [5, 7, 2]]) # Wrong: Normalizing columns (features) by transposing normalizer = Normalizer() X_wrong = normalizer.fit_transform(X.T).T # Right: Normalize rows (samples) X_right = normalizer.fit_transform(X) print("Wrong normalization (columns):\n", X_wrong) print("\nRight normalization (rows):\n", X_right)
Output
Wrong normalization (columns):
[[0.62469505 0.26726124 0.37139068]
[0.15617376 0.80178373 0.92847669]
[0.31234752 0.53452248 0.18569534]]
Right normalization (rows):
[[0.87287156 0.21821789 0.43643578]
[0.10540926 0.31622777 0.9486833 ]
[0.58123819 0.81373347 0.23249528]]
Quick Reference
| Method | Purpose | Notes |
|---|---|---|
| Normalizer(norm='l2') | Normalize each sample to unit length | Use for row-wise normalization |
| StandardScaler() | Standardize features by removing mean and scaling to unit variance | Feature-wise scaling |
| MinMaxScaler() | Scale features to a given range (default 0-1) | Feature-wise scaling |
| fit_transform(X) | Fit to data and transform it | Use on training data |
| transform(X) | Transform new data using fitted scaler | Use on test or new data |
Key Takeaways
Use sklearn.preprocessing.Normalizer to scale each sample (row) to unit norm.
Normalization scales rows independently, unlike standardization which scales features.
Always apply normalization on training data with fit_transform, then transform new data.
Avoid normalizing columns by mistake; Normalizer works row-wise.
For feature scaling, consider StandardScaler or MinMaxScaler instead.