Computer Visionml~8 mins

Hand and face landmark detection in Computer Vision - Model Metrics & Evaluation

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Metrics & Evaluation - Hand and face landmark detection

Which metric matters for Hand and face landmark detection and WHY

For hand and face landmark detection, the key metric is Mean Squared Error (MSE) or Normalized Mean Error (NME). These measure how close the predicted points are to the true points on the hand or face.

We want the predicted landmarks to be as close as possible to the real landmarks, so smaller error means better model.

Sometimes, Percentage of Correct Keypoints (PCK) is used. It counts how many points fall within a certain distance from the true points, showing accuracy in a more intuitive way.

Confusion matrix or equivalent visualization

Landmark detection is a regression task, so confusion matrix does not apply directly.

Instead, we use error distance between predicted and true points. For example:

    True point: (x=50, y=100)
    Predicted point: (x=52, y=98)
    Distance error = sqrt((52-50)^2 + (98-100)^2) = sqrt(4 + 4) = 2.83 pixels

We calculate this for all points and average to get MSE or NME.

Precision vs Recall tradeoff with concrete examples

Precision and recall are not typical metrics here because this is not a classification task.

Instead, the tradeoff is between accuracy of landmark localization and model speed or complexity.

For example, a very accurate model might take longer to run, which is bad for real-time apps like video calls.

A faster model might predict landmarks less precisely, causing small errors in applications like gesture control.

Choosing the right balance depends on the use case.

What "good" vs "bad" metric values look like for this use case

Good: Average landmark error less than 5 pixels on a 256x256 image, or PCK above 90% within a small threshold.

This means most points are very close to the true landmarks, so the model is reliable.

Bad: Average error above 15 pixels or PCK below 70% means landmarks are often far from correct spots, causing poor results in applications.

Metrics pitfalls

Ignoring scale: Measuring error in pixels without normalizing for image size can mislead. Use normalized error.
Overfitting: Very low error on training data but high error on new images means the model memorizes instead of generalizing.
Data leakage: Testing on images very similar to training can inflate performance.
Using classification metrics: Precision and recall do not apply here and can confuse evaluation.

Self-check question

Your hand landmark model has an average normalized error of 0.12 (12%) on test data. Is it good for production? Why or why not?

Answer: An error of 12% means landmarks are on average 12% of the image size away from true points. This is quite high and may cause noticeable mistakes in applications. Usually, errors below 5% are preferred for good quality. So, this model likely needs improvement before production.

Key Result

Mean Squared Error or Normalized Mean Error are key metrics showing how close predicted landmarks are to true points.

Practice

(1/5)

1. What is the main purpose of hand and face landmark detection in computer vision?

easy

A. To compress video files

B. To increase image resolution

C. To change the color of images

D. To find key points on hands and faces in images or videos

Hand and face landmark detection in Computer Vision - Model Metrics & Evaluation

Start learning this pattern below

Practice

Solution

Step 1: Understand the goal of landmark detection

Step 2: Compare options with the goal

Final Answer:

Quick Check:

Solution

Step 1: Recall MediaPipe import syntax

Step 2: Check each option

Final Answer:

Quick Check:

Solution

Step 1: Understand the code flow

Step 2: Interpret the output

Final Answer:

Quick Check:

Solution

Step 1: Check input image format for MediaPipe FaceMesh

Step 2: Understand error cause

Final Answer:

Quick Check:

Solution

Step 1: Understand challenges in gesture recognition

Step 2: Choose best method to improve robustness

Final Answer:

Quick Check: