In face recognition, fairness means the model works equally well for all groups, like different skin colors, ages, or genders. We use False Positive Rate (FPR) and False Negative Rate (FNR) for each group to check fairness. If one group has many more mistakes, the model is unfair. We also look at Equal Error Rate (EER) and Demographic Parity to compare groups. These metrics help us find if the model treats everyone fairly.
Fairness in face recognition in Computer Vision - Model Metrics & Evaluation
Group A confusion matrix:
TP = 90 FP = 10
FN = 5 TN = 95
Group B confusion matrix:
TP = 70 FP = 30
FN = 20 TN = 80
Total samples per group = 200
Calculations for Group A:
Precision = 90 / (90 + 10) = 0.9
Recall = 90 / (90 + 5) = 0.947
FPR = 10 / (10 + 95) = 0.095
Calculations for Group B:
Precision = 70 / (70 + 30) = 0.7
Recall = 70 / (70 + 20) = 0.778
FPR = 30 / (30 + 80) = 0.273
Notice Group B has worse recall and higher false positives, showing unfairness.
Imagine a face recognition system for unlocking phones. If it has high precision but low recall for a group, it means it rarely mistakes others for that person (good), but often fails to recognize the real user (bad). This frustrates users in that group.
On the other hand, if recall is high but precision is low, the system might unlock for wrong people in that group, risking security.
Fairness means balancing these so no group suffers more false rejections or false acceptances than others.
Good fairness: Similar precision, recall, FPR, and FNR across all groups. For example, all groups have recall around 0.9 and FPR around 0.05.
Bad fairness: One group has recall 0.95 but another 0.6, or one group's FPR is 0.01 but another's is 0.3. This means the model is biased and treats groups unequally.
- Ignoring group differences: Reporting only overall accuracy hides if some groups have poor results.
- Data imbalance: If some groups have fewer samples, metrics can be misleading.
- Overfitting to majority group: Model may perform well on large groups but poorly on minorities.
- Using accuracy alone: Accuracy can be high if the model always guesses the majority group correctly, ignoring fairness.
Your face recognition model has 98% overall accuracy but only 50% recall for a minority group. Is it good for production? Why or why not?
Answer: No, it is not good. Even though overall accuracy is high, the low recall for the minority group means many real users in that group are not recognized. This is unfair and harms user experience for that group.