How to use MinMaxScaler sklearn in python

MlopsHow-ToBeginner · 3 min read

How to Use MinMaxScaler in sklearn with Python

Use MinMaxScaler from sklearn.preprocessing to scale features to a given range, usually 0 to 1. Fit the scaler on your training data using fit() or fit_transform(), then transform your data with transform().

📐

Syntax

The MinMaxScaler scales each feature to a given range, defaulting to 0 and 1. You create an instance, optionally set the feature_range, then fit it to your data and transform it.

MinMaxScaler(feature_range=(min, max)): creates the scaler with desired output range.
fit(X): computes min and max values from data X.
transform(X): scales data X using the fitted min and max.
fit_transform(X): fits and transforms in one step.

python

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0, 1))  # Create scaler
scaler.fit(X)  # Learn min and max from data X
X_scaled = scaler.transform(X)  # Scale data

# Or combine fit and transform
X_scaled = scaler.fit_transform(X)

💻

Example

This example shows how to scale a small dataset with MinMaxScaler. The original data is scaled to the range 0 to 1.

python

from sklearn.preprocessing import MinMaxScaler
import numpy as np

# Sample data: 3 samples, 2 features
X = np.array([[10, 200], [15, 300], [20, 400]])

scaler = MinMaxScaler()  # Default range 0 to 1
X_scaled = scaler.fit_transform(X)

print("Original data:\n", X)
print("Scaled data:\n", X_scaled)

Output

Original data: [[ 10 200] [ 15 300] [ 20 400]] Scaled data: [[0. 0. ] [0.5 0.5] [1. 1. ]]

⚠️

Common Pitfalls

Common mistakes include:

Not fitting the scaler on training data before transforming test data, which causes errors or wrong scaling.
Fitting and transforming test data separately, which leaks information from test to train.
Forgetting to transform new data with the same scaler used on training data.

Always fit the scaler only on training data, then use transform() on test or new data.

python

from sklearn.preprocessing import MinMaxScaler
import numpy as np

# Training data
X_train = np.array([[10, 200], [15, 300], [20, 400]])
# Test data
X_test = np.array([[12, 250], [18, 350]])

scaler = MinMaxScaler()
scaler.fit(X_train)  # Fit only on training data

X_train_scaled = scaler.transform(X_train)  # Transform training data
X_test_scaled = scaler.transform(X_test)    # Transform test data with same scaler

print("Scaled training data:\n", X_train_scaled)
print("Scaled test data:\n", X_test_scaled)

Output

Scaled training data: [[0. 0. ] [0.5 0.5] [1. 1. ]] Scaled test data: [[0.1 0.25] [0.8 0.75]]

📊

Quick Reference

Summary tips for using MinMaxScaler:

Use fit() on training data only.
Use transform() on test or new data.
Default scaling range is 0 to 1, but you can change it with feature_range.
Use fit_transform() to fit and scale training data in one step.

✅

Key Takeaways

Fit MinMaxScaler only on training data to avoid data leakage.

Transform test and new data using the scaler fitted on training data.

Default scaling range is 0 to 1 but can be customized with feature_range.

Use fit_transform() to fit and scale training data in one step.

MinMaxScaler scales each feature independently to the specified range.