0
0
Computer Visionml~8 mins

Fairness in face recognition in Computer Vision - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Fairness in face recognition
Which metric matters for Fairness in face recognition and WHY

In face recognition, fairness means the model works equally well for all groups, like different skin colors, ages, or genders. We use False Positive Rate (FPR) and False Negative Rate (FNR) for each group to check fairness. If one group has many more mistakes, the model is unfair. We also look at Equal Error Rate (EER) and Demographic Parity to compare groups. These metrics help us find if the model treats everyone fairly.

Confusion matrix example for two groups
Group A confusion matrix:
  TP = 90  FP = 10
  FN = 5   TN = 95

Group B confusion matrix:
  TP = 70  FP = 30
  FN = 20  TN = 80

Total samples per group = 200

Calculations for Group A:
  Precision = 90 / (90 + 10) = 0.9
  Recall = 90 / (90 + 5) = 0.947
  FPR = 10 / (10 + 95) = 0.095

Calculations for Group B:
  Precision = 70 / (70 + 30) = 0.7
  Recall = 70 / (70 + 20) = 0.778
  FPR = 30 / (30 + 80) = 0.273

Notice Group B has worse recall and higher false positives, showing unfairness.
    
Precision vs Recall tradeoff with fairness examples

Imagine a face recognition system for unlocking phones. If it has high precision but low recall for a group, it means it rarely mistakes others for that person (good), but often fails to recognize the real user (bad). This frustrates users in that group.

On the other hand, if recall is high but precision is low, the system might unlock for wrong people in that group, risking security.

Fairness means balancing these so no group suffers more false rejections or false acceptances than others.

What "good" vs "bad" metric values look like for fairness

Good fairness: Similar precision, recall, FPR, and FNR across all groups. For example, all groups have recall around 0.9 and FPR around 0.05.

Bad fairness: One group has recall 0.95 but another 0.6, or one group's FPR is 0.01 but another's is 0.3. This means the model is biased and treats groups unequally.

Common pitfalls in fairness metrics
  • Ignoring group differences: Reporting only overall accuracy hides if some groups have poor results.
  • Data imbalance: If some groups have fewer samples, metrics can be misleading.
  • Overfitting to majority group: Model may perform well on large groups but poorly on minorities.
  • Using accuracy alone: Accuracy can be high if the model always guesses the majority group correctly, ignoring fairness.
Self-check question

Your face recognition model has 98% overall accuracy but only 50% recall for a minority group. Is it good for production? Why or why not?

Answer: No, it is not good. Even though overall accuracy is high, the low recall for the minority group means many real users in that group are not recognized. This is unfair and harms user experience for that group.

Key Result
Fairness in face recognition requires similar precision, recall, and error rates across all demographic groups to ensure equal treatment.