0
0
MlopsHow-ToBeginner · 4 min read

How to Scale Features in Python Using sklearn

To scale features in Python, use sklearn.preprocessing scalers like StandardScaler or MinMaxScaler. These tools transform your data so all features have similar ranges, improving model training and predictions.
📐

Syntax

Feature scaling in sklearn uses scaler classes with these steps:

  • Import the scaler from sklearn.preprocessing.
  • Create a scaler object (e.g., StandardScaler()).
  • Fit the scaler to your data using fit() or fit_transform().
  • Transform your data with transform() or use fit_transform() to do both at once.

Example syntax:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
💻

Example

This example shows how to scale a small dataset using StandardScaler and MinMaxScaler. It prints the original and scaled data to compare.

python
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import numpy as np

# Sample data with 3 features
data = np.array([[10, 200, 3000],
                 [20, 300, 4000],
                 [30, 400, 5000],
                 [40, 500, 6000]])

# StandardScaler: mean=0, std=1
scaler_std = StandardScaler()
scaled_std = scaler_std.fit_transform(data)

# MinMaxScaler: scales features to [0, 1]
scaler_minmax = MinMaxScaler()
scaled_minmax = scaler_minmax.fit_transform(data)

print("Original data:\n", data)
print("\nStandardScaler scaled data:\n", scaled_std)
print("\nMinMaxScaler scaled data:\n", scaled_minmax)
Output
Original data: [[ 10 200 3000] [ 20 300 4000] [ 30 400 5000] [ 40 500 6000]] StandardScaler scaled data: [[-1.34164079 -1.34164079 -1.34164079] [-0.4472136 -0.4472136 -0.4472136 ] [ 0.4472136 0.4472136 0.4472136 ] [ 1.34164079 1.34164079 1.34164079]] MinMaxScaler scaled data: [[0. 0. 0. ] [0.33333333 0.33333333 0.33333333] [0.66666667 0.66666667 0.66666667] [1. 1. 1. ]]
⚠️

Common Pitfalls

Common mistakes when scaling features include:

  • Scaling training and test data separately, which causes inconsistent scaling and poor model performance.
  • Not fitting the scaler on training data only, then applying it to test data.
  • Using scaling on categorical features that should be encoded differently.

Always fit the scaler on training data, then transform both training and test data with the same scaler.

python
from sklearn.preprocessing import StandardScaler
import numpy as np

# Training data
train = np.array([[1, 2], [3, 4], [5, 6]])
# Test data
test = np.array([[7, 8], [9, 10]])

# Wrong: fitting scaler separately on train and test
scaler_wrong_train = StandardScaler()
train_scaled_wrong = scaler_wrong_train.fit_transform(train)

scaler_wrong_test = StandardScaler()
test_scaled_wrong = scaler_wrong_test.fit_transform(test)

# Right: fit on train, transform train and test
scaler_right = StandardScaler()
train_scaled_right = scaler_right.fit_transform(train)
test_scaled_right = scaler_right.transform(test)

print("Wrong train scaled:\n", train_scaled_wrong)
print("Wrong test scaled:\n", test_scaled_wrong)
print("Right train scaled:\n", train_scaled_right)
print("Right test scaled:\n", test_scaled_right)
Output
Wrong train scaled: [[-1.22474487 -1.22474487] [ 0. 0. ] [ 1.22474487 1.22474487]] Wrong test scaled: [[-1.22474487 -1.22474487] [ 1.22474487 1.22474487]] Right train scaled: [[-1.22474487 -1.22474487] [ 0. 0. ] [ 1.22474487 1.22474487]] Right test scaled: [[2.44948974 2.44948974] [3.67423461 3.67423461]]
📊

Quick Reference

Summary tips for feature scaling in Python:

  • StandardScaler: scales data to mean 0 and standard deviation 1.
  • MinMaxScaler: scales data to a fixed range, usually 0 to 1.
  • Always fit scaler on training data only.
  • Apply the same scaler to test or new data with transform().
  • Do not scale categorical features directly.

Key Takeaways

Use sklearn scalers like StandardScaler or MinMaxScaler to scale features for better model performance.
Always fit the scaler on training data only, then transform test data with the same scaler.
StandardScaler centers data to mean 0 and std 1; MinMaxScaler scales data to a fixed range like 0 to 1.
Avoid scaling categorical features directly; encode them properly before scaling if needed.
Scaling ensures features have similar ranges, helping many machine learning algorithms work better.