For object detection models like the R-CNN family, the key metric is mean Average Precision (mAP). This metric measures how well the model finds and correctly labels objects in images. It balances both precision (how many detected objects are correct) and recall (how many true objects are found). We use mAP because object detection needs both accurate location and correct classification.
0
0
R-CNN family overview in Computer Vision - Model Metrics & Evaluation
Metrics & Evaluation - R-CNN family overview
Which metric matters for R-CNN family and WHY
Confusion matrix for object detection (simplified)
| Predicted Object | Predicted No Object
-------------------------------------------
True Object | TP | FN
No Object | FP | TN
-------------------------------------------
Total samples = TP + FP + FN + TN
Here, TP means the model correctly found an object, FP means it found something that is not an object, FN means it missed an object, and TN means it correctly ignored background.
Precision vs Recall tradeoff with examples
In object detection:
- High precision means most detected objects are correct. This is important when false alarms are costly, like in self-driving cars where wrong detections can cause accidents.
- High recall means the model finds most of the true objects. This is important in security cameras where missing a person is risky.
Improving one often lowers the other. For example, setting a high confidence threshold increases precision but lowers recall.
What good vs bad metric values look like for R-CNN models
Good R-CNN model:
- mAP above 0.7 (70%) on standard datasets like COCO or PASCAL VOC
- Precision and recall balanced around 0.7 or higher
- Confusion matrix shows low FP and FN counts
Bad R-CNN model:
- mAP below 0.4 (40%) indicating poor detection or classification
- Very low recall (e.g., 0.2) means many objects missed
- Very low precision (e.g., 0.3) means many false detections
Common pitfalls in evaluating R-CNN models
- Accuracy paradox: High accuracy can be misleading if the dataset has many background images and few objects.
- Data leakage: Using test images during training inflates metrics falsely.
- Overfitting: Very high training mAP but low test mAP means the model memorizes training images and fails to generalize.
- Ignoring localization quality: Only counting classification accuracy without checking bounding box overlap (IoU) can give wrong impressions.
Self-check question
Your R-CNN model has 98% accuracy but only 12% recall on detecting pedestrians. Is it good for production? Why or why not?
Answer: No, it is not good. The high accuracy is misleading because most of the image is background (no pedestrians). The very low recall means the model misses most pedestrians, which is dangerous in real applications like self-driving cars or surveillance.
Key Result
Mean Average Precision (mAP) is the key metric for R-CNN models, balancing precision and recall to measure detection quality.