0
0
ML Pythonml~8 mins

Imbalanced class handling (SMOTE, class weights) in ML Python - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Imbalanced class handling (SMOTE, class weights)
Which metric matters for Imbalanced Class Handling and WHY

When classes are imbalanced, accuracy can be misleading because the model may just predict the majority class well. Instead, Precision, Recall, and F1-score are important. Recall tells us how many actual minority cases we catch, and Precision tells us how many predicted minority cases are correct. F1-score balances both. These metrics help us understand if the model is truly learning the minority class or just ignoring it.

Confusion Matrix Example
      Actual \ Predicted | Positive | Negative
      -------------------|----------|---------
      Positive (Minority)|    40    |   10    
      Negative (Majority)|    20    |  930    

      Total samples = 1000

      TP = 40, FN = 10, FP = 20, TN = 930
    

From this matrix:

  • Precision = 40 / (40 + 20) = 0.67
  • Recall = 40 / (40 + 10) = 0.80
  • F1-score = 2 * (0.67 * 0.80) / (0.67 + 0.80) ≈ 0.73
Precision vs Recall Tradeoff with Examples

In imbalanced data, improving one metric often lowers the other:

  • High Precision, Low Recall: The model is very sure when it predicts minority class but misses many actual minority cases. Example: A spam filter that rarely marks good emails as spam but misses many spam emails.
  • High Recall, Low Precision: The model catches most minority cases but also has many false alarms. Example: A cancer detector that finds almost all cancer cases but sometimes wrongly flags healthy people.

Using SMOTE or class weights helps balance this tradeoff by giving the model more minority examples or more importance to minority errors.

What Good vs Bad Metric Values Look Like

For imbalanced class problems:

  • Good: Precision and Recall both above 0.7, F1-score above 0.7, showing balanced detection and correctness.
  • Bad: High accuracy (e.g., 95%) but very low Recall (e.g., 10%) on minority class, meaning the model ignores minority cases.

Good models catch minority cases well without too many false alarms.

Common Pitfalls in Metrics for Imbalanced Classes
  • Accuracy Paradox: High accuracy can hide poor minority class detection.
  • Data Leakage: If minority class examples leak into training and test sets, metrics look better but model won't generalize.
  • Overfitting: Using SMOTE incorrectly can cause the model to memorize synthetic samples, inflating metrics.
  • Ignoring Class Distribution: Not adjusting metrics or thresholds for imbalance leads to misleading results.
Self Check

Your model has 98% accuracy but only 12% recall on the minority (fraud) class. Is it good for production?

Answer: No. The model misses 88% of fraud cases, which is dangerous. Despite high accuracy, it fails to detect most fraud. You should improve recall using techniques like SMOTE or class weights.

Key Result
For imbalanced classes, focus on Precision, Recall, and F1-score rather than accuracy to truly measure minority class detection.