What is feature scaling in ml in python

MlopsConceptBeginner · 3 min read

Feature Scaling in Machine Learning with Python using sklearn

Feature scaling in machine learning is the process of normalizing or standardizing data features so they have a similar scale. In Python, sklearn.preprocessing provides tools like StandardScaler and MinMaxScaler to easily perform feature scaling, which helps models learn better and faster.

⚙️

How It Works

Feature scaling adjusts the range of data features so they are on a similar scale. Imagine you have a dataset with height in centimeters and weight in kilograms. Height values might be around 150-200, while weight might be 50-100. Without scaling, the model might think height is more important just because its numbers are bigger.

Scaling methods like standardization subtract the average and divide by the standard deviation, making data centered around zero with a spread of one. Another method, normalization, rescales data to a fixed range like 0 to 1. This helps machine learning algorithms treat all features equally and speeds up learning.

💻

Example

This example shows how to use StandardScaler from sklearn.preprocessing to scale features in Python.

python

from sklearn.preprocessing import StandardScaler
import numpy as np

# Sample data: height (cm), weight (kg)
data = np.array([[170, 65], [180, 80], [160, 55], [175, 75]])

scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

print("Original data:\n", data)
print("\nScaled data:\n", data_scaled)

Output

Original data: [[170 65] [180 80] [160 55] [175 75]] Scaled data: [[ 0. -0.50709255] [ 1.22474487 1.52127766] [-1.22474487 -1.52127766] [ 0.70710678 0.50709255]]

🎯

When to Use

Use feature scaling when your machine learning model depends on the distance or magnitude of features, such as in algorithms like k-nearest neighbors, support vector machines, and gradient descent-based models like linear regression or neural networks.

It is especially important when features have different units or scales, to prevent bias toward features with larger values. For example, in predicting house prices, features like area (square feet) and number of rooms should be scaled so the model treats them fairly.

✅

Key Points

Feature scaling makes data features comparable by adjusting their range.
StandardScaler standardizes features to zero mean and unit variance.
MinMaxScaler rescales features to a fixed range, usually 0 to 1.
Scaling improves model training speed and accuracy for many algorithms.
Always fit the scaler on training data and apply the same transformation to test data.

✅

Key Takeaways

Feature scaling normalizes data features to a similar range for better model performance.

Use sklearn's StandardScaler or MinMaxScaler to scale features easily in Python.

Scaling is crucial for algorithms sensitive to feature magnitude or distance.

Always fit scalers on training data only to avoid data leakage.

Proper scaling speeds up training and improves model accuracy.