0
0
Data Analysis Pythondata~5 mins

Scaling and normalization concepts in Data Analysis Python

Choose your learning style9 modes available
Introduction

Scaling and normalization help make data fair and easy to compare. They change numbers to a common scale without changing their meaning.

When features have very different ranges, like age (0-100) and income (1000-100000).
Before using machine learning models that care about distance, like k-nearest neighbors.
When you want to speed up training of models like neural networks.
To avoid one feature dominating others because of its scale.
When visualizing data to see patterns clearly.
Syntax
Data Analysis Python
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# For Min-Max Scaling (Normalization)
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)

# For Standard Scaling (Standardization)
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

Min-Max Scaling rescales data to a fixed range, usually 0 to 1.

Standard Scaling centers data to mean 0 and scales to unit variance.

Examples
This example scales numbers from 10-50 to 0-1 range.
Data Analysis Python
from sklearn.preprocessing import MinMaxScaler
import numpy as np

data = np.array([[10], [20], [30], [40], [50]])
scaler = MinMaxScaler()
scaled = scaler.fit_transform(data)
print(scaled)
This example centers data around 0 and scales it to have standard deviation 1.
Data Analysis Python
from sklearn.preprocessing import StandardScaler
import numpy as np

data = np.array([[10], [20], [30], [40], [50]])
scaler = StandardScaler()
scaled = scaler.fit_transform(data)
print(scaled)
Sample Program

This program shows how to scale two features: height and weight. First, it rescales them to 0-1 range. Then, it standardizes them to have mean 0 and variance 1.

Data Analysis Python
from sklearn.preprocessing import MinMaxScaler, StandardScaler
import numpy as np

# Sample data: heights in cm and weights in kg
data = np.array([[150, 50], [160, 60], [170, 65], [180, 80], [190, 90]])

# Min-Max Scaling
min_max_scaler = MinMaxScaler()
data_minmax = min_max_scaler.fit_transform(data)
print('Min-Max Scaled Data:')
print(data_minmax)

# Standard Scaling
standard_scaler = StandardScaler()
data_standard = standard_scaler.fit_transform(data)
print('\nStandard Scaled Data:')
print(data_standard)
OutputSuccess
Important Notes

Always fit the scaler on training data only, then transform test data to avoid data leakage.

Min-Max scaling is sensitive to outliers; standard scaling is more robust.

Scaling does not change the shape of data distribution, only the scale.

Summary

Scaling and normalization adjust data to a common scale for fair comparison.

Min-Max scaling rescales data to a fixed range, usually 0 to 1.

Standard scaling centers data to mean 0 and scales to unit variance.