0
0
PyTorchml~8 mins

CNN architecture for image classification in PyTorch - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - CNN architecture for image classification
Which metric matters for CNN image classification and WHY

For image classification using CNNs, accuracy is often the main metric. It tells us how many images the model labels correctly out of all images. But accuracy alone can be misleading if classes are unbalanced.

So, we also look at precision and recall for each class. Precision shows how many predicted images of a class are actually correct. Recall shows how many images of a class the model found out of all that class has. The F1 score balances precision and recall.

These metrics help us understand if the CNN is good at recognizing images correctly and not mixing classes.

Confusion matrix example
          Predicted
          Cat  Dog  Bird
Actual Cat   50   2    3
       Dog    4  45    1
       Bird   2   3   40

Total samples = 150

True Positives (Cat) = 50
False Positives (Cat) = 4 + 2 = 6
False Negatives (Cat) = 2 + 3 = 5

Precision (Cat) = 50 / (50 + 6) = 0.89
Recall (Cat) = 50 / (50 + 5) = 0.91

This matrix shows how many images were correctly or wrongly classified by the CNN for each class.

Precision vs Recall tradeoff with examples

Imagine a CNN that classifies animals in photos. If it has high precision for "Dog", it means when it says "Dog", it is usually right. But it might miss some dogs (low recall).

If it has high recall for "Dog", it finds almost all dogs but might wrongly label some cats as dogs (lower precision).

For some tasks, like medical image classification, high recall is critical to not miss any disease. For others, like sorting photos, high precision might be more important to avoid mistakes.

Good vs Bad metric values for CNN image classification
  • Good: Accuracy above 90%, precision and recall above 85% for all classes means the CNN is reliable.
  • Bad: Accuracy around 50% or precision/recall below 60% means the CNN struggles to classify images correctly.
  • Very high accuracy but low recall on a class means the model misses many images of that class.
Common pitfalls in CNN metrics
  • Accuracy paradox: High accuracy can happen if one class dominates but the model ignores others.
  • Data leakage: If test images are too similar to training, metrics look better than real.
  • Overfitting: Training accuracy is high but test accuracy is low, meaning the CNN memorizes training images.
Self-check question

Your CNN model has 98% accuracy but only 12% recall on the "cat" class. Is it good for production?

Answer: No. The model misses most cat images (low recall), so it is not reliable for detecting cats even if overall accuracy is high.

Key Result
Accuracy shows overall correctness, but precision and recall reveal class-wise strengths and weaknesses in CNN image classification.