0
0
ML Pythonml~12 mins

Imbalanced class handling (SMOTE, class weights) in ML Python - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - Imbalanced class handling (SMOTE, class weights)

This pipeline shows how to handle imbalanced classes in a dataset using SMOTE to create synthetic samples and class weights to help the model learn better from minority classes.

Data Flow - 6 Stages
1Raw data input
1000 rows x 5 columnsOriginal dataset with imbalanced classes (90% class 0, 10% class 1)1000 rows x 5 columns
[Feature1=2.3, Feature2=1.1, ..., Class=0]
2Train/test split
1000 rows x 5 columnsSplit data into training (80%) and testing (20%) setsTrain: 800 rows x 5 columns, Test: 200 rows x 5 columns
Train sample: [Feature1=2.3, ..., Class=0], Test sample: [Feature1=1.5, ..., Class=1]
3SMOTE oversampling
Train: 800 rows x 5 columnsCreate synthetic minority class samples to balance classes in training setTrain: 1440 rows x 5 columns (balanced classes)
Synthetic sample: [Feature1=2.0, Feature2=1.3, ..., Class=1]
4Feature scaling
Train: 1440 rows x 5 columns, Test: 200 rows x 5 columnsFit scaler on train features and scale (transform) both train and test features to similar ranges for model trainingTrain: 1440 rows x 5 columns (scaled), Test: 200 rows x 5 columns (scaled)
[Feature1=0.45, Feature2=0.33, ..., Class=1]
5Model training with class weights
Train: 1440 rows x 5 columns (scaled)Train model using class weights to emphasize minority classTrained model
Class weights: {0:1.0, 1:1.0} after SMOTE (balanced)
6Model evaluation
Test: 200 rows x 5 columns (scaled)Evaluate model performance on imbalanced test setPerformance metrics
Accuracy=0.92, Precision=0.85, Recall=0.88
Training Trace - Epoch by Epoch

Loss
0.7 |*       
0.6 | *      
0.5 |  *     
0.4 |   *    
0.3 |    *   
0.2 |     *  
0.1 |       
    +--------
     1 2 3 4 5 Epochs
EpochLoss ↓Accuracy ↑Observation
10.650.70Model starts learning, loss high, accuracy moderate
20.480.80Loss decreases, accuracy improves
30.350.87Model learns minority class better
40.280.90Loss continues to decrease, accuracy rises
50.220.92Training converges with good balance
Prediction Trace - 4 Layers
Layer 1: Input features
Layer 2: Hidden layer with ReLU activation
Layer 3: Output layer with sigmoid activation
Layer 4: Threshold decision
Model Quiz - 3 Questions
Test your understanding
What is the main purpose of SMOTE in this pipeline?
ATo scale features to the same range
BTo remove samples from the majority class
CTo create synthetic samples for the minority class
DTo split data into train and test sets
Key Insight
Handling imbalanced classes with SMOTE and class weights helps the model learn minority classes better, improving overall performance and fairness.