Stereo vision estimates depth by comparing two images from slightly different views. The key metric is disparity error, which measures how close the estimated pixel shifts are to the true shifts. Lower disparity error means more accurate depth perception.
In machine learning models for stereo vision, metrics like Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE) on disparity values are used. These show how far off the predicted depth is from the real depth.
Why? Because the goal is to get depth right, so measuring the difference between predicted and actual depth is the best way to know if the model works well.