Computer Visionml~8 mins

R-CNN family overview in Computer Vision - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - R-CNN family overview

Which metric matters for R-CNN family and WHY

For object detection models like the R-CNN family, the key metric is mean Average Precision (mAP). This metric measures how well the model finds and correctly labels objects in images. It balances both precision (how many detected objects are correct) and recall (how many true objects are found). We use mAP because object detection needs both accurate location and correct classification.

Confusion matrix for object detection (simplified)

       | Predicted Object | Predicted No Object
    -------------------------------------------
    True Object    |      TP          |        FN
    No Object      |      FP          |        TN
    -------------------------------------------
    Total samples = TP + FP + FN + TN

Here, TP means the model correctly found an object, FP means it found something that is not an object, FN means it missed an object, and TN means it correctly ignored background.

Precision vs Recall tradeoff with examples

In object detection:

High precision means most detected objects are correct. This is important when false alarms are costly, like in self-driving cars where wrong detections can cause accidents.
High recall means the model finds most of the true objects. This is important in security cameras where missing a person is risky.

Improving one often lowers the other. For example, setting a high confidence threshold increases precision but lowers recall.

What good vs bad metric values look like for R-CNN models

Good R-CNN model:

mAP above 0.7 (70%) on standard datasets like COCO or PASCAL VOC
Precision and recall balanced around 0.7 or higher
Confusion matrix shows low FP and FN counts

Bad R-CNN model:

mAP below 0.4 (40%) indicating poor detection or classification
Very low recall (e.g., 0.2) means many objects missed
Very low precision (e.g., 0.3) means many false detections

Common pitfalls in evaluating R-CNN models

Accuracy paradox: High accuracy can be misleading if the dataset has many background images and few objects.
Data leakage: Using test images during training inflates metrics falsely.
Overfitting: Very high training mAP but low test mAP means the model memorizes training images and fails to generalize.
Ignoring localization quality: Only counting classification accuracy without checking bounding box overlap (IoU) can give wrong impressions.

Self-check question

Your R-CNN model has 98% accuracy but only 12% recall on detecting pedestrians. Is it good for production? Why or why not?

Answer: No, it is not good. The high accuracy is misleading because most of the image is background (no pedestrians). The very low recall means the model misses most pedestrians, which is dangerous in real applications like self-driving cars or surveillance.

Key Result

Mean Average Precision (mAP) is the key metric for R-CNN models, balancing precision and recall to measure detection quality.