Computer Visionml~8 mins

Fairness in face recognition in Computer Vision - Model Metrics & Evaluation

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Metrics & Evaluation - Fairness in face recognition

Which metric matters for Fairness in face recognition and WHY

In face recognition, fairness means the model works equally well for all groups, like different skin colors, ages, or genders. We use False Positive Rate (FPR) and False Negative Rate (FNR) for each group to check fairness. If one group has many more mistakes, the model is unfair. We also look at Equal Error Rate (EER) and Demographic Parity to compare groups. These metrics help us find if the model treats everyone fairly.

Confusion matrix example for two groups

Group A confusion matrix:
  TP = 90  FP = 10
  FN = 5   TN = 95

Group B confusion matrix:
  TP = 70  FP = 30
  FN = 20  TN = 80

Total samples per group = 200

Calculations for Group A:
  Precision = 90 / (90 + 10) = 0.9
  Recall = 90 / (90 + 5) = 0.947
  FPR = 10 / (10 + 95) = 0.095

Calculations for Group B:
  Precision = 70 / (70 + 30) = 0.7
  Recall = 70 / (70 + 20) = 0.778
  FPR = 30 / (30 + 80) = 0.273

Notice Group B has worse recall and higher false positives, showing unfairness.

Precision vs Recall tradeoff with fairness examples

Imagine a face recognition system for unlocking phones. If it has high precision but low recall for a group, it means it rarely mistakes others for that person (good), but often fails to recognize the real user (bad). This frustrates users in that group.

On the other hand, if recall is high but precision is low, the system might unlock for wrong people in that group, risking security.

Fairness means balancing these so no group suffers more false rejections or false acceptances than others.

What "good" vs "bad" metric values look like for fairness

Good fairness: Similar precision, recall, FPR, and FNR across all groups. For example, all groups have recall around 0.9 and FPR around 0.05.

Bad fairness: One group has recall 0.95 but another 0.6, or one group's FPR is 0.01 but another's is 0.3. This means the model is biased and treats groups unequally.

Common pitfalls in fairness metrics

Ignoring group differences: Reporting only overall accuracy hides if some groups have poor results.
Data imbalance: If some groups have fewer samples, metrics can be misleading.
Overfitting to majority group: Model may perform well on large groups but poorly on minorities.
Using accuracy alone: Accuracy can be high if the model always guesses the majority group correctly, ignoring fairness.

Self-check question

Your face recognition model has 98% overall accuracy but only 50% recall for a minority group. Is it good for production? Why or why not?

Answer: No, it is not good. Even though overall accuracy is high, the low recall for the minority group means many real users in that group are not recognized. This is unfair and harms user experience for that group.

Key Result

Fairness in face recognition requires similar precision, recall, and error rates across all demographic groups to ensure equal treatment.

Practice

(1/5)

What does fairness in face recognition mainly aim to achieve?

easy

A. More complex model architecture

B. Faster processing speed

C. Higher resolution images

D. Equal accuracy for all demographic groups

Which of the following is the correct way to check fairness in a face recognition model?

metrics = {'group_A': 0.92, 'group_B': 0.85}
# What should we compare?

easy

A. Only check metrics['group_A']

B. Compare metrics['group_A'] and metrics['group_B'] for equality

C. Ignore metrics and check model size

D. Compare metrics['group_A'] with a random number

Consider this Python code snippet evaluating fairness metrics:

group_accuracies = {'A': 0.90, 'B': 0.75, 'C': 0.88}
threshold = 0.80
biased_groups = [g for g, acc in group_accuracies.items() if acc < threshold]
print(biased_groups)

What is the output?

medium

A. ['B']

B. ['A', 'B']

C. ['C']

D. []

Find the error in this fairness evaluation code snippet:

metrics = {'group1': 0.85, 'group2': 0.80}
threshold = 0.82
biased = [g for g, v in metrics if v < threshold]
print(biased)

medium

A. Missing .items() when iterating over dictionary

B. Wrong comparison operator

C. Threshold value is too high

D. Print statement syntax error

Fairness in face recognition in Computer Vision - Model Metrics & Evaluation

Start learning this pattern below

Practice

Solution

Step 1: Understand fairness goal

Step 2: Identify fairness metric

Final Answer:

Quick Check:

Solution

Step 1: Identify fairness check

Step 2: Apply comparison

Final Answer:

Quick Check:

Solution

Step 1: Understand the code logic

Step 2: Check each group's accuracy

Final Answer:

Quick Check:

Solution

Step 1: Identify dictionary iteration error

Step 2: Fix iteration to use .items()

Final Answer:

Quick Check:

Solution

Step 1: Identify fairness problem

Step 2: Choose best fairness improvement

Step 3: Evaluate other options

Final Answer:

Quick Check: