0
0
Computer Visionml~8 mins

CV applications (autonomous driving, medical, retail) in Computer Vision - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - CV applications (autonomous driving, medical, retail)
Which metric matters for CV applications and WHY

In computer vision tasks like autonomous driving, medical imaging, and retail, the choice of metric depends on the goal:

  • Autonomous driving: High Recall is critical to detect all obstacles and pedestrians to avoid accidents. Precision is also important to reduce false alarms that may cause unnecessary stops.
  • Medical imaging: High Recall ensures no disease cases are missed, which is vital for patient safety. Precision helps avoid false positives that can cause stress and extra tests.
  • Retail (e.g., product detection): Balanced Precision and Recall matter to correctly identify products without too many mistakes, improving customer experience and inventory management.

Overall, Precision, Recall, and F1-score are key metrics. Accuracy alone can be misleading if classes are imbalanced.

Confusion Matrix Example

For a medical image classifier detecting disease (Positive) vs healthy (Negative):

      | Predicted Positive | Predicted Negative |
      |--------------------|--------------------|
      | True Positive (TP): 90  | False Negative (FN): 10 |
      | False Positive (FP): 15 | True Negative (TN): 85  |
    

Totals: TP + FP + TN + FN = 90 + 15 + 85 + 10 = 200 samples

From this matrix:

  • Precision = 90 / (90 + 15) = 0.857
  • Recall = 90 / (90 + 10) = 0.9
  • F1-score = 2 * (0.857 * 0.9) / (0.857 + 0.9) ≈ 0.878
Precision vs Recall Tradeoff with Examples

In CV applications, improving one metric can reduce the other:

  • Autonomous driving: Missing a pedestrian (low recall) can cause accidents, so recall is prioritized even if precision drops (more false alarms).
  • Medical imaging: Missing a cancer case (low recall) is dangerous, so recall is critical. But too many false positives (low precision) cause unnecessary tests.
  • Retail: False positives (low precision) may confuse customers, while false negatives (low recall) mean missed products. Balanced metrics improve shopping experience.

Choosing the right balance depends on the risk and cost of errors in each application.

What Good vs Bad Metric Values Look Like
  • Good: Recall and precision above 0.85 in medical and autonomous driving tasks show reliable detection with few misses and false alarms.
  • Bad: High accuracy (e.g., 95%) but low recall (e.g., 50%) means many positive cases are missed, which is unsafe in medical or driving contexts.
  • In retail, precision or recall below 0.7 may cause poor product recognition and customer dissatisfaction.
Common Metrics Pitfalls
  • Accuracy paradox: High accuracy can hide poor recall if data is imbalanced (e.g., many healthy images, few disease cases).
  • Data leakage: If test images are too similar to training, metrics look better but model fails in real use.
  • Overfitting indicators: Very high training metrics but low test metrics show model memorizes training data, not generalizing well.
  • Ignoring class imbalance: Not using metrics like F1-score or AUC can mislead about model quality.
Self-Check Question

Your autonomous driving model has 98% accuracy but only 12% recall on detecting pedestrians. Is it good for production? Why or why not?

Answer: No, it is not good. The low recall means the model misses 88% of pedestrians, which is very dangerous. High accuracy likely comes from many non-pedestrian images. Recall is critical here to avoid accidents.

Key Result
In CV applications, recall is often most critical to avoid missing important cases, but precision and F1-score balance are also key depending on context.