YOLO is a model that finds objects in pictures. It needs to be good at both finding all objects (Recall) and making sure what it finds is correct (Precision). We use Precision, Recall, and mAP (mean Average Precision) to check how well YOLO works. mAP is special because it looks at how well the model finds objects at different levels of detail and overlap.
YOLO concept in PyTorch - Model Metrics & Evaluation
+-----------------------+
| Confusion |
|-----------------------|
| True Positives (TP) | Objects correctly found
| False Positives (FP) | Wrong objects found
| False Negatives (FN) | Objects missed
+-----------------------+
Total objects = TP + FN
Total detections = TP + FP
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
For YOLO, we also check how close the predicted box is to the real box using IoU (Intersection over Union). Only predictions with IoU above a threshold count as TP.
Imagine a security camera that spots people. If YOLO has high Precision, it means it rarely says "person" when there is none (few false alarms). But it might miss some people (low Recall).
If YOLO has high Recall, it finds almost all people but might sometimes say "person" when there is none (more false alarms).
For safety, high Recall is important to not miss anyone. For counting objects in a store, high Precision is important to avoid mistakes.
- Good: Precision and Recall above 0.8, mAP above 0.7 means YOLO finds most objects correctly and misses few.
- Bad: Precision below 0.5 means many wrong detections. Recall below 0.5 means many objects missed. mAP below 0.4 means poor overall detection.
- Accuracy paradox: High accuracy can be misleading if most images have no objects. The model might just say "no object" always.
- Data leakage: Testing on images very similar to training can give too high metrics.
- Overfitting: Very high training mAP but low test mAP means model memorized training images, not learned general patterns.
- IoU threshold choice: Too low threshold inflates TP, too high makes detection too strict.
Your YOLO model has 98% accuracy but only 12% recall on detecting cars. Is it good for production?
Answer: No. The model misses most cars (low recall). It might say "no car" often, which inflates accuracy if many images have no cars. For car detection, missing cars is bad, so recall must improve.