PyTorchml~8 mins

Faster R-CNN usage in PyTorch - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Faster R-CNN usage

Which metric matters for Faster R-CNN and WHY

Faster R-CNN is used for object detection. The key metric is mean Average Precision (mAP). It measures how well the model finds objects and how accurate the bounding boxes are. mAP combines precision and recall over different object classes and detection thresholds. This helps us know if the model finds most objects (high recall) and if the found objects are correct (high precision).

Other useful metrics include Precision and Recall for each class, and Intersection over Union (IoU) to check how close predicted boxes are to real boxes.

Confusion matrix for object detection

Object detection does not use a simple confusion matrix like classification. Instead, it uses matches between predicted boxes and ground truth boxes based on IoU.

True Positives (TP): Predicted boxes correctly matched to ground truth (IoU >= threshold)
False Positives (FP): Predicted boxes with no matching ground truth
False Negatives (FN): Ground truth boxes missed by predictions
True Negatives (TN): Not usually defined in object detection

Example for one class:

Ground truth boxes: 10
Predicted boxes: 12
TP: 8
FP: 4 (12 - 8)
FN: 2 (10 - 8)

Precision = TP / (TP + FP) = 8 / 12 = 0.67
Recall = TP / (TP + FN) = 8 / 10 = 0.8

Precision vs Recall tradeoff with Faster R-CNN

In object detection, precision means how many detected objects are correct. Recall means how many real objects are found.

Example 1: High precision, low recall
The model only detects very clear objects, so almost all detections are correct (high precision). But it misses many objects (low recall).

Example 2: High recall, low precision
The model detects many objects, including uncertain ones. It finds most real objects (high recall) but also many wrong detections (low precision).

Depending on the use case, you may want to balance precision and recall. For example, in self-driving cars, missing an object (low recall) can be dangerous, so recall is very important.

What good vs bad metric values look like for Faster R-CNN

Good values:

mAP > 0.7 means the model detects objects well with good accuracy.
Precision and recall both above 0.7 show balanced detection.
IoU threshold (e.g., 0.5) met consistently for bounding boxes.

Bad values:

mAP below 0.3 means poor detection or many wrong boxes.
Precision very low (< 0.5) means many false detections.
Recall very low (< 0.5) means many missed objects.
Bounding boxes with low IoU (< 0.3) mean inaccurate localization.

Common pitfalls in Faster R-CNN metrics

Ignoring class imbalance: Some classes may have few examples, causing misleading high mAP if only common classes are detected well.
Using accuracy: Accuracy is not useful for detection because many background areas exist and can inflate accuracy.
Overfitting: Very high training mAP but low validation mAP means the model memorizes training data and won't generalize.
Data leakage: If test images appear in training, metrics will be unrealistically high.
Wrong IoU threshold: Using too low IoU threshold can make bounding boxes seem better than they are.

Self-check question

Your Faster R-CNN model has 98% accuracy but only 12% recall on detecting pedestrians. Is this good for production? Why or why not?

Answer: No, this is not good. Accuracy is misleading here because most image areas are background, so the model is mostly correct by saying "no pedestrian". But 12% recall means it misses 88% of pedestrians, which is dangerous for safety. You need to improve recall to catch more pedestrians.

Key Result

Mean Average Precision (mAP) is the key metric for Faster R-CNN, balancing precision and recall to measure detection quality.