For Haar cascade face detection, precision and recall are the most important metrics. Precision tells us how many detected faces are actually faces (few false alarms). Recall tells us how many real faces the detector finds (few misses). Since missing a face or wrongly detecting a face both matter, we want a good balance. The F1 score combines precision and recall to give one number showing overall detection quality.
Haar cascade face detection in Computer Vision - Model Metrics & Evaluation
| Predicted Face | Predicted No Face |
|----------------|-------------------|
| True Positive | False Positive |
| False Negative | True Negative |
Example:
TP = 80 (faces correctly detected)
FP = 20 (non-faces wrongly detected as faces)
FN = 15 (faces missed)
TN = 885 (non-faces correctly ignored)
Total samples = 80 + 20 + 15 + 885 = 1000
If the detector is set to be very sensitive, it finds almost all faces (high recall) but also detects many false faces (low precision). This means many false alarms, which can annoy users.
If the detector is strict, it only detects very clear faces (high precision) but misses some faces (low recall). This means some faces go undetected, which can be bad for security or photo tagging.
For example, in a photo app, missing a face (low recall) is worse because users want all faces found. So recall is more important. In a security camera, false alarms (low precision) can cause wasted attention, so precision matters more.
Good: Precision and recall both above 0.85 means most faces are found and few false alarms happen.
Bad: Precision below 0.5 means many false detections, recall below 0.5 means many faces missed.
For example, precision=0.9 and recall=0.9 is excellent. Precision=0.4 and recall=0.7 is poor because many false faces are detected.
- Accuracy paradox: If most images have no faces, accuracy can be high by always predicting no face, but the detector is useless.
- Data leakage: Testing on images used in training inflates metrics falsely.
- Overfitting: Detector works well on training images but poorly on new images, showing high training metrics but low real-world performance.
- Ignoring class imbalance: Faces are rare compared to non-faces, so metrics like accuracy can be misleading.
Your Haar cascade face detector has 98% accuracy but only 12% recall on faces. Is it good for production? Why or why not?
Answer: No, it is not good. The high accuracy is misleading because most images have no faces, so the detector guesses "no face" most times. The very low recall means it misses almost all faces, which defeats the purpose of face detection.