PyTorchml~8 mins

torchvision detection models in PyTorch - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - torchvision detection models

Which metric matters for torchvision detection models and WHY

For object detection models like those in torchvision, the key metric is Mean Average Precision (mAP). This metric measures how well the model finds and correctly labels objects in images. It balances both precision (how many detected objects are correct) and recall (how many real objects are found). Since detection involves locating objects and classifying them, mAP gives a clear picture of overall performance.

Other useful metrics include Precision and Recall at different Intersection over Union (IoU) thresholds, which show how strict the model is about matching predicted boxes to real boxes.

Confusion matrix or equivalent visualization

In object detection, confusion matrices are less straightforward because predictions include bounding boxes and classes. Instead, we use a table of True Positives (TP), False Positives (FP), and False Negatives (FN) based on IoU threshold matching.

Class: Cat
+----------------+----------------+----------------+
|                | Predicted Cat  | Predicted Not  |
|                |                | Cat            |
+----------------+----------------+----------------+
| Actual Cat     | TP = 80        | FN = 20        |
+----------------+----------------+----------------+
| Actual Not Cat | FP = 15        | TN = (ignored) |
+----------------+----------------+----------------+

Note: TN (True Negative) is not usually counted in detection metrics because the background is large and undefined.

Precision vs Recall tradeoff with concrete examples

Imagine a security camera detecting people:

High Precision, Low Recall: The model only reports people when very sure. It misses some people (low recall) but rarely mistakes objects for people (high precision). Good if false alarms are costly.
High Recall, Low Precision: The model tries to find every person, even if unsure. It finds almost all people (high recall) but sometimes mistakes objects for people (low precision). Good if missing a person is worse than false alarms.

mAP balances these by averaging precision over recall levels and IoU thresholds.

What "good" vs "bad" metric values look like for torchvision detection models

For mAP (usually reported as a percentage):

Good: mAP above 50% on challenging datasets like COCO means the model detects and classifies objects well.
Bad: mAP below 20% means the model struggles to find or correctly label objects.

Precision and recall values closer to 1.0 (or 100%) are better, but often there is a tradeoff.

Common pitfalls in metrics for detection models

Ignoring IoU thresholds: Counting predictions as correct without checking if bounding boxes overlap enough can inflate metrics.
Data leakage: Testing on images the model saw during training gives unrealistically high scores.
Overfitting: Very high training mAP but low test mAP means the model memorizes training images but fails to generalize.
Ignoring class imbalance: Some classes may be rare, so overall mAP can hide poor performance on those classes.

Self-check question

Your torchvision detection model has 98% accuracy but only 12% recall on detecting cars. Is it good for production? Why or why not?

Answer: No, it is not good. The high accuracy is misleading because it likely counts many true negatives (background). The very low recall means the model misses most cars, which is bad if detecting cars is important. You want both high precision and recall for reliable detection.

Key Result

Mean Average Precision (mAP) is the key metric for torchvision detection models, balancing precision and recall over object localization and classification.