In face recognition, fairness means the model works equally well for all groups, like different skin colors, ages, or genders. We use False Positive Rate (FPR) and False Negative Rate (FNR) for each group to check fairness. If one group has many more mistakes, the model is unfair. We also look at Equal Error Rate (EER) and Demographic Parity to compare groups. These metrics help us find if the model treats everyone fairly.
Fairness in face recognition in Computer Vision - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Group A confusion matrix:
TP = 90 FP = 10
FN = 5 TN = 95
Group B confusion matrix:
TP = 70 FP = 30
FN = 20 TN = 80
Total samples per group = 200
Calculations for Group A:
Precision = 90 / (90 + 10) = 0.9
Recall = 90 / (90 + 5) = 0.947
FPR = 10 / (10 + 95) = 0.095
Calculations for Group B:
Precision = 70 / (70 + 30) = 0.7
Recall = 70 / (70 + 20) = 0.778
FPR = 30 / (30 + 80) = 0.273
Notice Group B has worse recall and higher false positives, showing unfairness.
Imagine a face recognition system for unlocking phones. If it has high precision but low recall for a group, it means it rarely mistakes others for that person (good), but often fails to recognize the real user (bad). This frustrates users in that group.
On the other hand, if recall is high but precision is low, the system might unlock for wrong people in that group, risking security.
Fairness means balancing these so no group suffers more false rejections or false acceptances than others.
Good fairness: Similar precision, recall, FPR, and FNR across all groups. For example, all groups have recall around 0.9 and FPR around 0.05.
Bad fairness: One group has recall 0.95 but another 0.6, or one group's FPR is 0.01 but another's is 0.3. This means the model is biased and treats groups unequally.
- Ignoring group differences: Reporting only overall accuracy hides if some groups have poor results.
- Data imbalance: If some groups have fewer samples, metrics can be misleading.
- Overfitting to majority group: Model may perform well on large groups but poorly on minorities.
- Using accuracy alone: Accuracy can be high if the model always guesses the majority group correctly, ignoring fairness.
Your face recognition model has 98% overall accuracy but only 50% recall for a minority group. Is it good for production? Why or why not?
Answer: No, it is not good. Even though overall accuracy is high, the low recall for the minority group means many real users in that group are not recognized. This is unfair and harms user experience for that group.
Practice
What does fairness in face recognition mainly aim to achieve?
Solution
Step 1: Understand fairness goal
Fairness means the model should work equally well for all groups, not just some.Step 2: Identify fairness metric
Accuracy or error rates should be similar across different demographic groups.Final Answer:
Equal accuracy for all demographic groups -> Option DQuick Check:
Fairness = Equal accuracy [OK]
- Thinking fairness means faster models
- Confusing fairness with image quality
- Assuming complex models are always fair
Which of the following is the correct way to check fairness in a face recognition model?
metrics = {'group_A': 0.92, 'group_B': 0.85}
# What should we compare?Solution
Step 1: Identify fairness check
Fairness requires comparing performance metrics across groups.Step 2: Apply comparison
Compare accuracy or error rates between group_A and group_B to find bias.Final Answer:
Compare metrics['group_A'] and metrics['group_B'] for equality -> Option BQuick Check:
Fairness check = Compare group metrics [OK]
- Checking only one group
- Ignoring metrics and focusing on model size
- Comparing to unrelated values
Consider this Python code snippet evaluating fairness metrics:
group_accuracies = {'A': 0.90, 'B': 0.75, 'C': 0.88}
threshold = 0.80
biased_groups = [g for g, acc in group_accuracies.items() if acc < threshold]
print(biased_groups)What is the output?
Solution
Step 1: Understand the code logic
The code collects groups with accuracy less than 0.80 into biased_groups.Step 2: Check each group's accuracy
Group A: 0.90 > 0.80 (not biased), B: 0.75 < 0.80 (biased), C: 0.88 > 0.80 (not biased)Final Answer:
['B'] -> Option AQuick Check:
Only group B accuracy < threshold [OK]
- Including groups with accuracy above threshold
- Misreading comparison operator
- Confusing list comprehension output
Find the error in this fairness evaluation code snippet:
metrics = {'group1': 0.85, 'group2': 0.80}
threshold = 0.82
biased = [g for g, v in metrics if v < threshold]
print(biased)Solution
Step 1: Identify dictionary iteration error
Iterating over a dictionary directly gives keys, not key-value pairs.Step 2: Fix iteration to use .items()
Use metrics.items() to get (key, value) pairs for comparison.Final Answer:
Missing .items() when iterating over dictionary -> Option AQuick Check:
Dictionary iteration needs .items() [OK]
- Iterating dict keys instead of items
- Changing threshold unnecessarily
- Assuming print syntax is wrong
You have a face recognition model with accuracy 0.95 on group X and 0.70 on group Y. Which approach best improves fairness?
Solution
Step 1: Identify fairness problem
Model performs worse on group Y, showing bias.Step 2: Choose best fairness improvement
Balanced data helps model learn features for all groups equally.Step 3: Evaluate other options
Increasing complexity alone may not fix bias; ignoring group Y is unfair; reducing group X accuracy is not ideal.Final Answer:
Collect more balanced training data including group Y -> Option CQuick Check:
Balanced data improves fairness [OK]
- Thinking model complexity fixes bias alone
- Ignoring underperforming groups
- Lowering accuracy on better groups
