Computer Visionml~8 mins

Face landmark detection in Computer Vision - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Face landmark detection

Which metric matters for Face Landmark Detection and WHY

Face landmark detection finds key points on a face, like eyes, nose, and mouth corners. The main metric is Mean Squared Error (MSE) or Normalized Mean Error (NME). These measure how close the predicted points are to the true points. Lower error means better accuracy. We use these because this is a regression task, not classification.

Confusion Matrix or Equivalent Visualization

Face landmark detection does not use a confusion matrix because it predicts coordinates, not classes. Instead, we visualize errors as distances between predicted and true points.

True points:       (x1, y1), (x2, y2), ..., (xN, yN)
Predicted points:  (x1', y1'), (x2', y2'), ..., (xN', yN')

Error per point = sqrt((x - x')^2 + (y - y')^2)
Mean Error = average of all point errors

Tradeoff: Precision vs Recall Equivalent

In face landmark detection, the tradeoff is between accuracy (how close points are) and robustness (working well on different faces and conditions). Improving accuracy might make the model sensitive to small changes, reducing robustness. Balancing these ensures landmarks are precise and reliable.

What "Good" vs "Bad" Metric Values Look Like

Good models have low mean error, often below 5% of the inter-ocular distance (distance between eyes). For example, an NME of 0.03 means average error is 3% of eye distance, which is good. Bad models have high errors, like 0.1 or more, meaning points are far from true locations.

Common Pitfalls in Metrics

Ignoring scale: Measuring error in pixels without normalization can mislead if face sizes vary.
Overfitting: Very low training error but high test error means model memorizes training faces.
Data leakage: Using same faces in training and testing inflates accuracy.
Not considering robustness: Good error on clear images but poor on occluded or rotated faces.

Self Check

Your face landmark model has a mean error of 0.02 on training data but 0.08 on new faces. Is it good for real use? Why or why not?

Answer: No, it is not good. The model fits training faces well but performs poorly on new faces, showing overfitting and poor generalization. You need to improve robustness and test on diverse data.

Key Result

Mean error (e.g., Normalized Mean Error) is key to measure how close predicted face landmarks are to true points; lower values mean better accuracy.