Bird
Raised Fist0
Computer Visionml~8 mins

Stereo vision concept in Computer Vision - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Stereo vision concept
Which metric matters for Stereo Vision and WHY

Stereo vision estimates depth by comparing two images from slightly different views. The key metric is disparity error, which measures how close the estimated pixel shifts are to the true shifts. Lower disparity error means more accurate depth perception.

In machine learning models for stereo vision, metrics like Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE) on disparity values are used. These show how far off the predicted depth is from the real depth.

Why? Because the goal is to get depth right, so measuring the difference between predicted and actual depth is the best way to know if the model works well.

Confusion Matrix or Equivalent Visualization

Stereo vision is a regression task, not classification, so confusion matrix does not apply directly.

Instead, we use an error distribution table or histogram showing how many pixels have disparity error within certain ranges.

Disparity Error Range | Number of Pixels
----------------------|-----------------
0 - 1 pixel           | 8500
1 - 2 pixels          | 1200
2 - 3 pixels          | 200
3+ pixels             | 100
Total Pixels          | 10000
    

This shows most pixels have very low error, meaning good depth estimation.

Precision vs Recall Tradeoff (or Equivalent) with Examples

In stereo vision, the tradeoff is between accuracy and completeness of depth estimation.

If the model is very strict, it may only estimate depth where it is very confident, leading to high accuracy but low coverage. This means fewer pixels have depth but those are very accurate.

If the model tries to estimate depth everywhere, it may have high coverage but lower accuracy, because some estimates are wrong.

Example:

  • High accuracy, low coverage: 95% pixels have error < 1 pixel, but only 70% of image pixels have depth.
  • High coverage, lower accuracy: 100% pixels have depth, but only 80% have error < 1 pixel.

Choosing depends on the application. For robot navigation, high accuracy on important pixels matters more.

What "Good" vs "Bad" Metric Values Look Like for Stereo Vision

Good stereo vision model:

  • Mean disparity error < 1 pixel
  • RMSE of depth less than a few centimeters (depending on scene scale)
  • High percentage (> 90%) of pixels with low error
  • Consistent depth maps without large holes or noise

Bad stereo vision model:

  • Mean disparity error > 3 pixels
  • Large noisy or missing depth areas
  • Depth estimates that do not align with real scene geometry
  • High variance in error across image
Common Metrics Pitfalls in Stereo Vision
  • Ignoring occlusions: Some pixels are visible in one camera but not the other, causing errors that should not be counted as model faults.
  • Using only average error: Average can hide large errors in small regions; look at error distribution too.
  • Data leakage: Training on images too similar to test images inflates performance.
  • Overfitting: Model performs well on training scenes but poorly on new scenes.
  • Ignoring scale: Depth error in pixels must be converted to real-world units for meaningful evaluation.
Self-Check Question

Your stereo vision model has a mean disparity error of 0.8 pixels on test images but misses depth estimates on 30% of pixels. Is this good?

Answer: It depends on your application. The low error means the depth it predicts is accurate. But missing 30% pixels means incomplete depth maps. For tasks needing full depth (like 3D reconstruction), this is a problem. For tasks focusing on key areas, it might be acceptable.

Key Result
Mean disparity error and coverage percentage are key metrics to evaluate stereo vision accuracy and completeness.

Practice

(1/5)
1. What is the main purpose of stereo vision in computer vision?
easy
A. To estimate the depth of objects by comparing two images
B. To enhance the color of images
C. To detect edges in a single image
D. To compress images for storage

Solution

  1. Step 1: Understand stereo vision basics

    Stereo vision uses two images taken from slightly different viewpoints to find depth information.
  2. Step 2: Identify the main goal

    The main goal is to estimate how far objects are by comparing their positions in the two images.
  3. Final Answer:

    To estimate the depth of objects by comparing two images -> Option A
  4. Quick Check:

    Stereo vision = Depth estimation [OK]
Hint: Stereo vision = depth from two images [OK]
Common Mistakes:
  • Confusing stereo vision with color enhancement
  • Thinking it works with only one image
  • Mixing depth estimation with edge detection
2. Which of the following correctly describes 'disparity' in stereo vision?
easy
A. The difference in brightness between two images
B. The average color value of an image
C. The difference in pixel positions of the same point in two images
D. The total number of pixels in an image

Solution

  1. Step 1: Define disparity in stereo vision

    Disparity is the horizontal difference in pixel positions of the same object point between the left and right images.
  2. Step 2: Match the correct description

    It is not about brightness or color but about position difference to calculate depth.
  3. Final Answer:

    The difference in pixel positions of the same point in two images -> Option C
  4. Quick Check:

    Disparity = pixel position difference [OK]
Hint: Disparity = position difference between images [OK]
Common Mistakes:
  • Confusing disparity with brightness or color
  • Thinking disparity is total pixels count
  • Mixing disparity with image resolution
3. Given two stereo images, the pixel of a point is at (x=150) in the left image and at (x=130) in the right image. What is the disparity value?
medium
A. 150
B. -20
C. 280
D. 20

Solution

  1. Step 1: Calculate disparity from pixel positions

    Disparity = x_left - x_right = 150 - 130 = 20 pixels.
  2. Step 2: Interpret the result

    Disparity is positive and represents how far the point shifted between images.
  3. Final Answer:

    20 -> Option D
  4. Quick Check:

    150 - 130 = 20 [OK]
Hint: Disparity = left x minus right x [OK]
Common Mistakes:
  • Subtracting right from left incorrectly
  • Using sum instead of difference
  • Ignoring sign of disparity
4. You wrote code to compute disparity but always get zero values. What is the most likely error?
medium
A. Swapping the x and y coordinates in calculations
B. Using the same image for both left and right inputs
C. Using color images instead of grayscale
D. Calculating disparity as the sum of pixel positions

Solution

  1. Step 1: Analyze zero disparity cause

    If both images are identical, the pixel positions match exactly, so disparity is zero everywhere.
  2. Step 2: Check other options

    Swapping coordinates or color use won't cause all zero disparity; summing positions gives wrong values but not zero everywhere.
  3. Final Answer:

    Using the same image for both left and right inputs -> Option B
  4. Quick Check:

    Same images = zero disparity [OK]
Hint: Different images needed for disparity [OK]
Common Mistakes:
  • Using identical images for stereo input
  • Mixing x and y coordinates without correction
  • Ignoring image color format effects
5. In a stereo vision system, if an object is very far away, how does its disparity value change and why?
hard
A. Disparity decreases because the pixel positions in both images become closer
B. Disparity increases because the object appears larger
C. Disparity stays the same regardless of distance
D. Disparity becomes negative because the object moves behind the cameras

Solution

  1. Step 1: Understand disparity-distance relation

    Disparity is inversely related to distance; far objects have smaller disparity because their positions in both images are closer.
  2. Step 2: Eliminate incorrect options

    Disparity does not increase with distance, nor does it become negative or stay constant.
  3. Final Answer:

    Disparity decreases because the pixel positions in both images become closer -> Option A
  4. Quick Check:

    Far object = small disparity [OK]
Hint: Farther objects have smaller disparity [OK]
Common Mistakes:
  • Assuming disparity grows with distance
  • Thinking disparity can be negative for far objects
  • Believing disparity is constant