0
0
Computer Visionml~8 mins

Stereo vision concept in Computer Vision - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Stereo vision concept
Which metric matters for Stereo Vision and WHY

Stereo vision estimates depth by comparing two images from slightly different views. The key metric is disparity error, which measures how close the estimated pixel shifts are to the true shifts. Lower disparity error means more accurate depth perception.

In machine learning models for stereo vision, metrics like Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE) on disparity values are used. These show how far off the predicted depth is from the real depth.

Why? Because the goal is to get depth right, so measuring the difference between predicted and actual depth is the best way to know if the model works well.

Confusion Matrix or Equivalent Visualization

Stereo vision is a regression task, not classification, so confusion matrix does not apply directly.

Instead, we use an error distribution table or histogram showing how many pixels have disparity error within certain ranges.

Disparity Error Range | Number of Pixels
----------------------|-----------------
0 - 1 pixel           | 8500
1 - 2 pixels          | 1200
2 - 3 pixels          | 200
3+ pixels             | 100
Total Pixels          | 10000
    

This shows most pixels have very low error, meaning good depth estimation.

Precision vs Recall Tradeoff (or Equivalent) with Examples

In stereo vision, the tradeoff is between accuracy and completeness of depth estimation.

If the model is very strict, it may only estimate depth where it is very confident, leading to high accuracy but low coverage. This means fewer pixels have depth but those are very accurate.

If the model tries to estimate depth everywhere, it may have high coverage but lower accuracy, because some estimates are wrong.

Example:

  • High accuracy, low coverage: 95% pixels have error < 1 pixel, but only 70% of image pixels have depth.
  • High coverage, lower accuracy: 100% pixels have depth, but only 80% have error < 1 pixel.

Choosing depends on the application. For robot navigation, high accuracy on important pixels matters more.

What "Good" vs "Bad" Metric Values Look Like for Stereo Vision

Good stereo vision model:

  • Mean disparity error < 1 pixel
  • RMSE of depth less than a few centimeters (depending on scene scale)
  • High percentage (> 90%) of pixels with low error
  • Consistent depth maps without large holes or noise

Bad stereo vision model:

  • Mean disparity error > 3 pixels
  • Large noisy or missing depth areas
  • Depth estimates that do not align with real scene geometry
  • High variance in error across image
Common Metrics Pitfalls in Stereo Vision
  • Ignoring occlusions: Some pixels are visible in one camera but not the other, causing errors that should not be counted as model faults.
  • Using only average error: Average can hide large errors in small regions; look at error distribution too.
  • Data leakage: Training on images too similar to test images inflates performance.
  • Overfitting: Model performs well on training scenes but poorly on new scenes.
  • Ignoring scale: Depth error in pixels must be converted to real-world units for meaningful evaluation.
Self-Check Question

Your stereo vision model has a mean disparity error of 0.8 pixels on test images but misses depth estimates on 30% of pixels. Is this good?

Answer: It depends on your application. The low error means the depth it predicts is accurate. But missing 30% pixels means incomplete depth maps. For tasks needing full depth (like 3D reconstruction), this is a problem. For tasks focusing on key areas, it might be acceptable.

Key Result
Mean disparity error and coverage percentage are key metrics to evaluate stereo vision accuracy and completeness.