Scaling and normalization help make data fair and easy to compare. They change numbers to a common scale without changing their meaning.
0
0
Scaling and normalization concepts in Data Analysis Python
Introduction
When features have very different ranges, like age (0-100) and income (1000-100000).
Before using machine learning models that care about distance, like k-nearest neighbors.
When you want to speed up training of models like neural networks.
To avoid one feature dominating others because of its scale.
When visualizing data to see patterns clearly.
Syntax
Data Analysis Python
from sklearn.preprocessing import MinMaxScaler, StandardScaler # For Min-Max Scaling (Normalization) scaler = MinMaxScaler() scaled_data = scaler.fit_transform(data) # For Standard Scaling (Standardization) scaler = StandardScaler() scaled_data = scaler.fit_transform(data)
Min-Max Scaling rescales data to a fixed range, usually 0 to 1.
Standard Scaling centers data to mean 0 and scales to unit variance.
Examples
This example scales numbers from 10-50 to 0-1 range.
Data Analysis Python
from sklearn.preprocessing import MinMaxScaler import numpy as np data = np.array([[10], [20], [30], [40], [50]]) scaler = MinMaxScaler() scaled = scaler.fit_transform(data) print(scaled)
This example centers data around 0 and scales it to have standard deviation 1.
Data Analysis Python
from sklearn.preprocessing import StandardScaler import numpy as np data = np.array([[10], [20], [30], [40], [50]]) scaler = StandardScaler() scaled = scaler.fit_transform(data) print(scaled)
Sample Program
This program shows how to scale two features: height and weight. First, it rescales them to 0-1 range. Then, it standardizes them to have mean 0 and variance 1.
Data Analysis Python
from sklearn.preprocessing import MinMaxScaler, StandardScaler import numpy as np # Sample data: heights in cm and weights in kg data = np.array([[150, 50], [160, 60], [170, 65], [180, 80], [190, 90]]) # Min-Max Scaling min_max_scaler = MinMaxScaler() data_minmax = min_max_scaler.fit_transform(data) print('Min-Max Scaled Data:') print(data_minmax) # Standard Scaling standard_scaler = StandardScaler() data_standard = standard_scaler.fit_transform(data) print('\nStandard Scaled Data:') print(data_standard)
OutputSuccess
Important Notes
Always fit the scaler on training data only, then transform test data to avoid data leakage.
Min-Max scaling is sensitive to outliers; standard scaling is more robust.
Scaling does not change the shape of data distribution, only the scale.
Summary
Scaling and normalization adjust data to a common scale for fair comparison.
Min-Max scaling rescales data to a fixed range, usually 0 to 1.
Standard scaling centers data to mean 0 and scales to unit variance.