Sometimes, one group in data is much bigger than others. This can make a model unfair or wrong. We use special ways to fix this so the model learns well for all groups.
Imbalanced class handling (SMOTE, class weights) in ML Python
from imblearn.over_sampling import SMOTE smote = SMOTE(sampling_strategy='minority') X_resampled, y_resampled = smote.fit_resample(X, y) # For class weights in sklearn models: model = SomeClassifier(class_weight='balanced') model.fit(X_train, y_train)
SMOTE creates new examples for the smaller group by mixing existing ones.
Class weights tell the model to pay more attention to smaller groups during training.
smote = SMOTE(sampling_strategy='minority')
X_res, y_res = smote.fit_resample(X, y)smote = SMOTE(sampling_strategy=0.5)
X_res, y_res = smote.fit_resample(X, y)model = LogisticRegression(class_weight='balanced')
model.fit(X_train, y_train)model = RandomForestClassifier(class_weight={0:1, 1:5})
model.fit(X_train, y_train)This program creates a dataset where one class is much smaller. It trains a logistic regression model three ways: normal, with SMOTE to add samples, and with class weights to pay more attention to the small class. It prints reports showing how well each method works.
from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import classification_report from imblearn.over_sampling import SMOTE # Create imbalanced data X, y = make_classification(n_classes=2, class_sep=2, weights=[0.9, 0.1], n_informative=3, n_redundant=1, flip_y=0, n_features=5, n_clusters_per_class=1, n_samples=200, random_state=42) # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Without handling imbalance model = LogisticRegression(max_iter=1000) model.fit(X_train, y_train) y_pred = model.predict(X_test) print("Without imbalance handling:") print(classification_report(y_test, y_pred)) # Using SMOTE smote = SMOTE(random_state=42) X_train_smote, y_train_smote = smote.fit_resample(X_train, y_train) model_smote = LogisticRegression(max_iter=1000) model_smote.fit(X_train_smote, y_train_smote) y_pred_smote = model_smote.predict(X_test) print("With SMOTE:") print(classification_report(y_test, y_pred_smote)) # Using class weights model_cw = LogisticRegression(class_weight='balanced', max_iter=1000) model_cw.fit(X_train, y_train) y_pred_cw = model_cw.predict(X_test) print("With class weights:") print(classification_report(y_test, y_pred_cw))
SMOTE works by creating new synthetic examples, not just copying existing ones.
Class weights are easier to use but may not always improve results as much as SMOTE.
Always check model performance on real test data to see if imbalance handling helps.
Imbalanced data can cause models to ignore small groups.
SMOTE creates new samples to balance classes.
Class weights tell the model to focus more on smaller classes.