ML Pythonml~7 mins

Imbalanced class handling (SMOTE, class weights) in ML Python

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

Sometimes, one group in data is much bigger than others. This can make a model unfair or wrong. We use special ways to fix this so the model learns well for all groups.

When you have very few examples of one type compared to others, like fraud detection where fraud cases are rare.

When your model always guesses the big group and ignores the small group, missing important cases.

When you want your model to be fair and not biased toward the bigger group.

When you want to improve accuracy for the smaller group without losing overall performance.

Syntax

ML Python

from imblearn.over_sampling import SMOTE

smote = SMOTE(sampling_strategy='minority')
X_resampled, y_resampled = smote.fit_resample(X, y)

# For class weights in sklearn models:
model = SomeClassifier(class_weight='balanced')
model.fit(X_train, y_train)

SMOTE creates new examples for the smaller group by mixing existing ones.

Class weights tell the model to pay more attention to smaller groups during training.

Examples

This creates new samples only for the smallest class.

ML Python

smote = SMOTE(sampling_strategy='minority')
X_res, y_res = smote.fit_resample(X, y)

This makes the smaller class half the size of the bigger class after resampling.

ML Python

smote = SMOTE(sampling_strategy=0.5)
X_res, y_res = smote.fit_resample(X, y)

This sets class weights automatically based on class frequencies.

ML Python

model = LogisticRegression(class_weight='balanced')
model.fit(X_train, y_train)

This manually sets class 1 to be 5 times more important than class 0.

ML Python

model = RandomForestClassifier(class_weight={0:1, 1:5})
model.fit(X_train, y_train)

Sample Model

This program creates a dataset where one class is much smaller. It trains a logistic regression model three ways: normal, with SMOTE to add samples, and with class weights to pay more attention to the small class. It prints reports showing how well each method works.

ML Python

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from imblearn.over_sampling import SMOTE

# Create imbalanced data
X, y = make_classification(n_classes=2, class_sep=2,
                           weights=[0.9, 0.1], n_informative=3,
                           n_redundant=1, flip_y=0, n_features=5,
                           n_clusters_per_class=1, n_samples=200, random_state=42)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Without handling imbalance
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Without imbalance handling:")
print(classification_report(y_test, y_pred))

# Using SMOTE
smote = SMOTE(random_state=42)
X_train_smote, y_train_smote = smote.fit_resample(X_train, y_train)
model_smote = LogisticRegression(max_iter=1000)
model_smote.fit(X_train_smote, y_train_smote)
y_pred_smote = model_smote.predict(X_test)
print("With SMOTE:")
print(classification_report(y_test, y_pred_smote))

# Using class weights
model_cw = LogisticRegression(class_weight='balanced', max_iter=1000)
model_cw.fit(X_train, y_train)
y_pred_cw = model_cw.predict(X_test)
print("With class weights:")
print(classification_report(y_test, y_pred_cw))

OutputSuccess

Important Notes

SMOTE works by creating new synthetic examples, not just copying existing ones.

Class weights are easier to use but may not always improve results as much as SMOTE.

Always check model performance on real test data to see if imbalance handling helps.

Summary

Imbalanced data can cause models to ignore small groups.

SMOTE creates new samples to balance classes.

Class weights tell the model to focus more on smaller classes.

Practice

(1/5)

1. What is the main purpose of using SMOTE in machine learning?

easy

A. To create synthetic samples for minority classes to balance the dataset

B. To reduce the size of the majority class by removing samples

C. To increase the number of features in the dataset

D. To randomly shuffle the dataset before training

Imbalanced class handling (SMOTE, class weights) in ML Python

Start learning this pattern below

Practice

Solution

Step 1: Understand SMOTE's role in imbalanced data

Step 2: Compare options with SMOTE's function

Final Answer:

Quick Check:

Solution

Step 1: Recall scikit-learn parameter for class weights

Step 2: Match options with correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Count original class samples

Step 2: Understand SMOTE behavior on balanced data

Step 3: Check actual output

Final Answer:

Quick Check:

Solution

Step 1: Check class_weight dictionary keys

Step 2: Understand impact of wrong keys

Final Answer:

Quick Check:

Solution

Step 1: Understand dataset imbalance

Step 2: Combine SMOTE and class weights

Step 3: Why combining is best

Final Answer:

Quick Check: