0
0
Computer Visionml~8 mins

CNN architecture review in Computer Vision - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - CNN architecture review
Which metric matters for CNN architecture review and WHY

When reviewing a CNN (Convolutional Neural Network) architecture, the key metrics to focus on are accuracy, precision, recall, and F1 score. These metrics tell us how well the CNN is recognizing patterns and making correct predictions.

Accuracy shows overall correctness, but it can be misleading if classes are unbalanced. Precision tells us how many predicted positives are actually correct, which is important when false alarms are costly. Recall tells us how many real positives the model finds, which matters when missing a positive is bad. F1 score balances precision and recall, giving a single number to compare models.

For CNNs used in image tasks, these metrics help us understand if the architecture is good at learning useful features and generalizing to new images.

Confusion matrix example for CNN predictions
      Actual \ Predicted | Cat | Dog | Other
      -------------------|-----|-----|------
      Cat                | 50  | 5   | 10   
      Dog                | 3   | 45  | 7    
      Other              | 2   | 4   | 60   
    

This matrix shows how many images of each true class were predicted as each class. For example, 50 cat images were correctly predicted as cats (true positives for cat), 5 cat images were wrongly predicted as dogs (false negatives for cat and false positives for dog), and so on.

Precision vs Recall tradeoff with CNN example

Imagine a CNN that detects cats in photos. If we want to be very sure when the model says "cat" (high precision), it might miss some cats (lower recall). This means fewer false alarms but more missed cats.

On the other hand, if we want to find every cat possible (high recall), the model might sometimes say "cat" when it is not (lower precision). This means catching all cats but with more false alarms.

Choosing the right balance depends on the goal. For example, in a pet app, missing a cat might be worse, so recall is more important. In a security camera, false alarms might be annoying, so precision matters more.

What good vs bad metric values look like for CNNs

Good CNN metrics:

  • Accuracy above 85% on a balanced dataset
  • Precision and recall both above 80%
  • F1 score close to precision and recall, showing balance
  • Confusion matrix with most predictions on the diagonal (correct class)

Bad CNN metrics:

  • High accuracy but very low recall or precision (model guesses mostly one class)
  • F1 score much lower than precision or recall, showing imbalance
  • Confusion matrix with many off-diagonal errors (wrong predictions)
  • Very low accuracy (below 50%) indicating poor learning
Common pitfalls when evaluating CNN metrics
  • Accuracy paradox: High accuracy can hide poor performance if classes are imbalanced.
  • Data leakage: If test images are too similar to training, metrics look better but model won't generalize.
  • Overfitting: Very high training accuracy but low test accuracy means model memorizes training images, not learning features.
  • Ignoring class imbalance: Not using precision, recall, or F1 can mislead about model quality.
Self-check question

Your CNN model has 98% accuracy but only 12% recall on the "cat" class. Is it good for production? Why or why not?

Answer: No, it is not good. The low recall means the model misses most cats, even though overall accuracy is high. This likely happens because the dataset is imbalanced or the model predicts mostly other classes. For production, especially if finding cats is important, recall must improve.

Key Result
For CNNs, balanced precision and recall with high accuracy and F1 score indicate a good architecture that learns well and generalizes.