0
0
Computer Visionml~8 mins

Human pose estimation concept in Computer Vision - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Human pose estimation concept
Which metric matters for Human Pose Estimation and WHY

In human pose estimation, the goal is to find key points on the body like elbows, knees, and wrists. The main metric used is Percentage of Correct Keypoints (PCK). It measures how many predicted points are close enough to the true points. This matters because it tells us how accurate the model is at locating body parts.

Another important metric is Mean Average Precision (mAP) for keypoints, which considers both precision and recall of detected points. It helps understand how well the model finds all keypoints without many mistakes.

Confusion Matrix or Equivalent Visualization

Human pose estimation does not use a classic confusion matrix because it predicts locations, not classes. Instead, we use a distance threshold to decide if a predicted keypoint is correct.

True Keypoint Location: (x, y)
Predicted Keypoint Location: (x', y')
Distance = sqrt((x - x')^2 + (y - y')^2)

If Distance < threshold: Correct Keypoint (TP)
Else: Incorrect Keypoint (FP or FN depending on missing or extra points)
    

Counting correct keypoints over total keypoints gives the PCK score.

Precision vs Recall Tradeoff with Examples

In pose estimation, precision means how many predicted keypoints are actually correct. Recall means how many true keypoints the model found.

If the model predicts many points, it may have high recall but low precision (many false points). If it predicts fewer points, it may have high precision but low recall (missing some keypoints).

Example: For a fitness app, missing a wrist keypoint (low recall) can make the app give wrong feedback. So recall is important. But too many wrong points (low precision) can confuse the app. A balance is needed.

What Good vs Bad Metric Values Look Like

Good: PCK above 85% means most keypoints are correctly found within the allowed distance. mAP close to 0.9 means high accuracy and coverage.

Bad: PCK below 50% means many keypoints are missed or wrongly placed. mAP below 0.5 shows poor detection quality.

Good models help apps track poses well. Bad models give wrong or missing body points, making apps unreliable.

Common Metrics Pitfalls
  • Ignoring distance threshold: Using too large a threshold inflates PCK, making the model seem better than it is.
  • Data leakage: Testing on images similar to training can give unrealistically high scores.
  • Overfitting: High training PCK but low test PCK means the model memorizes poses instead of generalizing.
  • Not considering occlusions: Keypoints hidden by objects or other people can lower recall unfairly.
Self-Check Question

Your human pose estimation model has 90% accuracy on training images but only 60% PCK on new images. Is it good for production? Why or why not?

Answer: No, it is not good. The big drop from training to new images shows overfitting. The model does not generalize well to new poses or backgrounds. It needs more training data or better design to improve real-world performance.

Key Result
Percentage of Correct Keypoints (PCK) is key to measure how accurately body points are located within a distance threshold.