For pre-trained detection models, the key metrics are Precision, Recall, and F1 score. These models find objects in images, so we want to know how many detected objects are correct (Precision) and how many real objects were found (Recall). The F1 score balances both. Also, mean Average Precision (mAP) is often used to measure overall detection quality across classes and thresholds.
Pre-trained detection models in Computer Vision - Model Metrics & Evaluation
| Predicted Object | Predicted No Object |
|------------------|---------------------|
| True Positive (TP)| False Negative (FN) |
| False Positive (FP)| True Negative (TN) |
TP: Correctly detected objects
FP: Wrong detections (false alarms)
FN: Missed objects
TN: Correctly ignored background
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
If the model is very strict, it detects fewer objects but with high confidence. This means high precision but low recall. For example, in security cameras, you want to avoid false alarms (high precision).
If the model detects many objects, including uncertain ones, it has high recall but low precision. For example, in wildlife monitoring, missing an animal is worse, so high recall is preferred.
The F1 score helps balance these two depending on the use case.
Good: Precision and Recall both above 0.8, F1 score near 0.85 or higher, and mAP above 0.75. This means the model finds most objects correctly and misses few.
Bad: Precision below 0.5 means many false detections. Recall below 0.5 means many missed objects. Low F1 and mAP indicate poor detection quality.
- Accuracy paradox: High accuracy can be misleading if most images have no objects (TN dominate).
- Data leakage: Using test images in training inflates metrics falsely.
- Overfitting: Very high training metrics but low test metrics show poor generalization.
- Ignoring IoU threshold: Detection quality depends on Intersection over Union threshold; metrics vary with it.
Your pre-trained detection model has 98% accuracy but only 12% recall on detecting cars. Is it good for production? Why or why not?
Answer: No, it is not good. The high accuracy is misleading because most images may not have cars, so the model correctly predicts no car often (TN). The very low recall means it misses most cars, which is bad for detection tasks where finding objects is critical.