0
0
Computer Visionml~8 mins

Evaluation and confusion matrix in Computer Vision - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Evaluation and confusion matrix
Which metric matters for this concept and WHY

In computer vision, especially for classification tasks, the confusion matrix helps us see how well the model predicts each class. Key metrics like accuracy, precision, recall, and F1 score come from this matrix. They tell us if the model is good at finding the right objects (high recall) and if its guesses are usually correct (high precision). This is important because some mistakes are worse than others depending on the task.

Confusion matrix or equivalent visualization (ASCII)
    Confusion Matrix Example (3 classes: Cat, Dog, Rabbit)

                 Predicted
               | Cat | Dog | Rabbit |
    Actual  ---+-----+-----+--------+
    Cat       | 50  |  2  |   3    |
    Dog       |  4  | 45  |   1    |
    Rabbit    |  2  |  3  |  40    |

    Explanation:
    - 50 images of cats correctly predicted as cats (True Positives for Cat)
    - 2 cats wrongly predicted as dogs (False Negatives for Cat, False Positives for Dog)
    - And so on for other classes.
    
Precision vs Recall tradeoff with concrete examples

Imagine a model that detects cats in photos.

  • High precision: When the model says "this is a cat," it is almost always right. Few wrong cat guesses. Good if you want to avoid false alarms.
  • High recall: The model finds almost all cats in the photos, even if it sometimes mistakes other animals for cats. Good if missing a cat is bad.

For example, if you want to find all cats for a rescue mission, recall is more important. But if you want to tag only real cats in a photo album, precision matters more.

What "good" vs "bad" metric values look like for this use case

Good metrics for a balanced computer vision classifier might be:

  • Accuracy above 90% on a balanced dataset
  • Precision and recall both above 85%
  • F1 score close to precision and recall, showing balance

Bad metrics might be:

  • Accuracy high but recall very low (model misses many objects)
  • Precision very low (many false alarms)
  • Confusion matrix shows many misclassifications between similar classes
Metrics pitfalls (accuracy paradox, data leakage, overfitting indicators)
  • Accuracy paradox: High accuracy can be misleading if classes are imbalanced. For example, if 95% of images are dogs, a model that always guesses dog gets 95% accuracy but is useless.
  • Data leakage: If test images are too similar or come from training data, metrics look better but model won't work well in real life.
  • Overfitting: Very high training accuracy but low test accuracy means the model memorizes training images but can't generalize.
Self-check: Your model has 98% accuracy but 12% recall on fraud. Is it good?

No, it is not good for fraud detection. The 98% accuracy is misleading because fraud cases are rare. The 12% recall means the model finds only 12% of actual frauds, missing most fraud cases. For fraud detection, high recall is critical to catch as many frauds as possible.

Key Result
Confusion matrix metrics like precision, recall, and F1 score reveal model strengths and weaknesses beyond accuracy, crucial for reliable computer vision evaluation.