In responsible computer vision (CV), fairness and accuracy metrics matter most. Accuracy shows how well the model predicts correctly. Fairness metrics check if the model treats all groups equally, avoiding bias. This helps prevent misuse like discrimination or unfair decisions.
Why responsible CV prevents misuse in Computer Vision - Why Metrics Matter
Actual \ Predicted | Positive | Negative
-------------------|----------|---------
Positive | TP | FN
Negative | FP | TN
TP = True Positive: Correct positive predictions
FP = False Positive: Wrong positive predictions
TN = True Negative: Correct negative predictions
FN = False Negative: Wrong negative predictions
This matrix helps us see errors that can cause misuse, like wrongly labeling someone, which can lead to unfair treatment.
Precision means how many predicted positives are actually correct. Recall means how many actual positives the model found.
In CV misuse prevention, high precision avoids false alarms (like wrongly accusing someone), and high recall avoids missing real issues (like missing harmful content).
For example, a face recognition system should have high precision to avoid misidentifying people, and high recall to catch all authorized users.
Good: Balanced precision and recall above 85%, low bias across groups, and consistent accuracy.
Bad: High accuracy but low recall or precision, showing the model misses or wrongly flags many cases. Also, large differences in performance between groups indicate unfairness.
- Accuracy paradox: High accuracy can hide poor performance if data is imbalanced (e.g., many negatives, few positives).
- Data leakage: When test data leaks into training, metrics look better but model fails in real use.
- Overfitting: Model performs well on training but poorly on new data, misleading metrics.
- Ignoring fairness: Good overall metrics but poor results for some groups cause misuse and harm.
No, it is not good. The model misses 88% of fraud cases (low recall), which means many frauds go undetected. High accuracy is misleading because fraud is rare, so the model mostly predicts no fraud correctly but fails where it matters most.