Computer Visionml~8 mins

DNN-based face detection in Computer Vision - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - DNN-based face detection

Which metric matters for DNN-based face detection and WHY

For face detection using deep neural networks, the key metrics are Precision and Recall. Precision tells us how many detected faces are actually faces, so it measures false alarms. Recall tells us how many real faces the model finds, so it measures missed faces. Both matter because we want to find as many faces as possible (high recall) without wrongly marking non-faces as faces (high precision). The F1 score balances these two. Also, the Average Precision (AP) over different confidence thresholds is often used to summarize performance.

Confusion matrix example for face detection

      | Predicted Face | Predicted No Face |
      |---------------|-------------------|
      | True Face (TP)  90                  10 (FN)          |
      | False Face (FP) 15                  885 (TN)         |

      Total samples = 1000

      Precision = TP / (TP + FP) = 90 / (90 + 15) = 0.857
      Recall = TP / (TP + FN) = 90 / (90 + 10) = 0.9
      F1 Score = 2 * (0.857 * 0.9) / (0.857 + 0.9) ≈ 0.878

Precision vs Recall tradeoff with examples

If the model is tuned to be very strict, it will only detect faces when very sure. This means high precision (few false alarms) but low recall (misses many faces). This is good if false alarms are costly, like in security checks.

If the model is tuned to detect as many faces as possible, it will catch almost all faces (high recall) but also mark some non-faces as faces (low precision). This is useful in photo apps where missing a face is worse than a few false detections.

Choosing the right balance depends on the application needs.

What good vs bad metric values look like for face detection

Good: Precision and Recall both above 0.85, F1 score near 0.9 or higher. This means most faces are found and few false alarms.
Bad: Precision or Recall below 0.5 means many false alarms or many missed faces. For example, Precision 0.4 means more than half detected faces are wrong.
Very high accuracy alone can be misleading if the dataset has many non-face images (class imbalance).

Common pitfalls in evaluating face detection metrics

Accuracy paradox: If most images have no faces, a model that always predicts no face can have high accuracy but is useless.
Data leakage: Testing on images very similar to training images inflates metrics falsely.
Overfitting: Very high training metrics but low test metrics means the model memorizes training faces but fails on new ones.
Ignoring IoU thresholds: Face detection usually requires bounding boxes to overlap enough with ground truth. Metrics must consider this overlap.

Self-check question

Your face detection model has 98% accuracy but only 12% recall on faces. Is it good for production? Why or why not?

Answer: No, it is not good. The high accuracy is misleading because most images may not have faces. The very low recall means the model misses 88% of faces, which defeats the purpose of face detection.

Key Result

Precision and recall are key for face detection; balance them to avoid missing faces or false alarms.