0
0
ML Pythonml~7 mins

Imbalanced class handling (SMOTE, class weights) in ML Python

Choose your learning style9 modes available
Introduction

Sometimes, one group in data is much bigger than others. This can make a model unfair or wrong. We use special ways to fix this so the model learns well for all groups.

When you have very few examples of one type compared to others, like fraud detection where fraud cases are rare.
When your model always guesses the big group and ignores the small group, missing important cases.
When you want your model to be fair and not biased toward the bigger group.
When you want to improve accuracy for the smaller group without losing overall performance.
Syntax
ML Python
from imblearn.over_sampling import SMOTE

smote = SMOTE(sampling_strategy='minority')
X_resampled, y_resampled = smote.fit_resample(X, y)

# For class weights in sklearn models:
model = SomeClassifier(class_weight='balanced')
model.fit(X_train, y_train)

SMOTE creates new examples for the smaller group by mixing existing ones.

Class weights tell the model to pay more attention to smaller groups during training.

Examples
This creates new samples only for the smallest class.
ML Python
smote = SMOTE(sampling_strategy='minority')
X_res, y_res = smote.fit_resample(X, y)
This makes the smaller class half the size of the bigger class after resampling.
ML Python
smote = SMOTE(sampling_strategy=0.5)
X_res, y_res = smote.fit_resample(X, y)
This sets class weights automatically based on class frequencies.
ML Python
model = LogisticRegression(class_weight='balanced')
model.fit(X_train, y_train)
This manually sets class 1 to be 5 times more important than class 0.
ML Python
model = RandomForestClassifier(class_weight={0:1, 1:5})
model.fit(X_train, y_train)
Sample Model

This program creates a dataset where one class is much smaller. It trains a logistic regression model three ways: normal, with SMOTE to add samples, and with class weights to pay more attention to the small class. It prints reports showing how well each method works.

ML Python
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from imblearn.over_sampling import SMOTE

# Create imbalanced data
X, y = make_classification(n_classes=2, class_sep=2,
                           weights=[0.9, 0.1], n_informative=3,
                           n_redundant=1, flip_y=0, n_features=5,
                           n_clusters_per_class=1, n_samples=200, random_state=42)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Without handling imbalance
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Without imbalance handling:")
print(classification_report(y_test, y_pred))

# Using SMOTE
smote = SMOTE(random_state=42)
X_train_smote, y_train_smote = smote.fit_resample(X_train, y_train)
model_smote = LogisticRegression(max_iter=1000)
model_smote.fit(X_train_smote, y_train_smote)
y_pred_smote = model_smote.predict(X_test)
print("With SMOTE:")
print(classification_report(y_test, y_pred_smote))

# Using class weights
model_cw = LogisticRegression(class_weight='balanced', max_iter=1000)
model_cw.fit(X_train, y_train)
y_pred_cw = model_cw.predict(X_test)
print("With class weights:")
print(classification_report(y_test, y_pred_cw))
OutputSuccess
Important Notes

SMOTE works by creating new synthetic examples, not just copying existing ones.

Class weights are easier to use but may not always improve results as much as SMOTE.

Always check model performance on real test data to see if imbalance handling helps.

Summary

Imbalanced data can cause models to ignore small groups.

SMOTE creates new samples to balance classes.

Class weights tell the model to focus more on smaller classes.