Non-maximum suppression (NMS) is used in object detection to remove overlapping boxes and keep only the best ones. The key metrics to evaluate NMS are Precision and Recall. Precision tells us how many of the detected boxes are correct (not false alarms), while Recall tells us how many true objects we found. We want a balance so we keep true objects (high recall) but avoid many overlapping or wrong boxes (high precision). The Intersection over Union (IoU) threshold in NMS controls this balance by deciding when boxes overlap too much and one should be removed.
Non-maximum suppression in PyTorch - Model Metrics & Evaluation
| Predicted Object | Predicted No Object |
|------------------|---------------------|
| True Positive (TP) | False Positive (FP) |
| False Negative (FN)| True Negative (TN) |
TP: Correct boxes kept after NMS
FP: Wrong boxes kept (false alarms)
FN: True boxes removed by NMS (missed objects)
TN: Background correctly ignored
Total samples = TP + FP + FN + TN
If the IoU threshold is too low, NMS removes many boxes, increasing precision (fewer false alarms) but lowering recall (missing true objects). For example, in a face detector, too strict NMS might miss some faces (low recall).
If the IoU threshold is too high, NMS keeps many overlapping boxes, increasing recall but lowering precision (more false alarms). For example, in a car detector, too loose NMS might keep many boxes for the same car (low precision).
Choosing the right IoU threshold balances precision and recall depending on the task needs.
- Good NMS: Precision and recall both above 0.8, meaning most true objects are detected and few false boxes remain.
- Bad NMS: Precision below 0.5 means many false boxes remain, cluttering results.
- Recall below 0.5 means many true objects are missed, which is bad for safety-critical tasks like pedestrian detection.
- Very high recall but very low precision means many duplicates or false alarms.
- Ignoring IoU threshold: Different IoU thresholds change precision and recall drastically, so always report the threshold used.
- Overfitting to training data: NMS tuned too tightly on training data may fail on new images.
- Confusing precision and recall: Precision is about false alarms, recall is about missed objects.
- Not considering class imbalance: If some classes are rare, metrics can be misleading.
- Using only accuracy: Accuracy is not meaningful for object detection because background dominates.
Your object detector with NMS has 98% accuracy but only 12% recall on pedestrians. Is it good for production? Why or why not?
Answer: No, it is not good. The high accuracy is misleading because most of the image is background (easy to classify). The very low recall means the detector misses most pedestrians, which is dangerous for applications like self-driving cars. Improving recall while keeping precision reasonable is critical.