How to Use StandardScaler in sklearn with Python
Use
StandardScaler from sklearn.preprocessing to scale features by removing the mean and scaling to unit variance. First, create a scaler object, then fit it to your data with fit(), and transform your data with transform() or combine both with fit_transform().Syntax
The basic usage of StandardScaler involves creating an instance, fitting it to your data to learn the mean and standard deviation, and then transforming the data to scale it.
scaler = StandardScaler(): Creates the scaler object.scaler.fit(X): Computes the mean and std of features inX.scaler.transform(X): ScalesXusing the learned parameters.scaler.fit_transform(X): Fits and transformsXin one step.
python
from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(X) # Learn mean and std from data X X_scaled = scaler.transform(X) # Scale data # Or combine both steps: X_scaled = scaler.fit_transform(X)
Example
This example shows how to scale a small dataset using StandardScaler. It fits the scaler to the data, transforms it, and prints the original and scaled data.
python
from sklearn.preprocessing import StandardScaler import numpy as np # Sample data: 3 samples, 2 features X = np.array([[1, 2], [3, 4], [5, 6]]) scaler = StandardScaler() X_scaled = scaler.fit_transform(X) print("Original data:\n", X) print("\nScaled data:\n", X_scaled)
Output
Original data:
[[1 2]
[3 4]
[5 6]]
Scaled data:
[[-1.22474487 -1.22474487]
[ 0. 0. ]
[ 1.22474487 1.22474487]]
Common Pitfalls
Common mistakes when using StandardScaler include:
- Fitting the scaler separately on training and test data, which causes data leakage. Always fit on training data only.
- Not transforming test data with the same scaler fitted on training data.
- Applying scaling before splitting data, which leaks information from test to train.
Correct usage fits the scaler on training data and transforms both training and test data with it.
python
from sklearn.preprocessing import StandardScaler import numpy as np # Correct: fitting scaler on training data only X_train = np.array([[1, 2], [3, 4]]) X_test = np.array([[5, 6]]) scaler = StandardScaler() scaler.fit(X_train) # Fit only on training data X_train_scaled = scaler.transform(X_train) # Do not fit again on test data (causes leakage) # scaler.fit(X_test) # Don't do this X_test_scaled = scaler.transform(X_test) # Use the same scaler print("Train scaled:\n", X_train_scaled) print("Test scaled:\n", X_test_scaled)
Output
Train scaled:
[[-1. -1.]
[ 1. 1.]]
Test scaled:
[[3. 3.]]
Quick Reference
Remember these key points when using StandardScaler:
- Use
fit()on training data only. - Use
transform()on any data to scale it. fit_transform()is a shortcut for fitting and transforming training data.- Scaled data has mean 0 and standard deviation 1 per feature.
Key Takeaways
Always fit StandardScaler on training data only to avoid data leakage.
Use fit_transform() to scale training data in one step.
Transform test or new data using the scaler fitted on training data.
StandardScaler centers data to mean 0 and scales to unit variance.
Scaling improves many machine learning models by normalizing feature ranges.