Bird
Raised Fist0
Computer Visionml~20 mins

Fairness in face recognition in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Fairness in face recognition
Problem:We want to build a face recognition model that works fairly well for all skin tones. Currently, the model performs well on lighter skin tones but poorly on darker skin tones.
Current Metrics:Overall accuracy: 90%, Accuracy on lighter skin tones: 95%, Accuracy on darker skin tones: 75%
Issue:The model is biased and overfits to lighter skin tones, causing unfair performance gaps.
Your Task
Reduce the accuracy gap between lighter and darker skin tones to less than 5%, while keeping overall accuracy above 85%.
You can only modify the training process and data handling.
Do not change the model architecture.
Keep training time reasonable (under 30 minutes).
Hint 1
Hint 2
Hint 3
Solution
Computer Vision
import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Simulated dataset loading function
# X: images, y: labels, skin_tones: 0 for lighter, 1 for darker
X = np.load('face_images.npy')  # shape (num_samples, 64, 64, 3)
y = np.load('face_labels.npy')  # shape (num_samples,)
skin_tones = np.load('skin_tones.npy')  # shape (num_samples,)

# Split data
X_train, X_test, y_train, y_test, skin_train, skin_test = train_test_split(
    X, y, skin_tones, test_size=0.2, random_state=42, stratify=y)

# Data augmentation for darker skin tones
datagen = ImageDataGenerator(
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True
)

# Separate darker skin tone samples
dark_indices = np.where(skin_train == 1)[0]
X_dark = X_train[dark_indices]
y_dark = y_train[dark_indices]

# Augment darker skin tone images to balance dataset
augmented_images = []
augmented_labels = []
for i in range(len(X_dark)):
    x = X_dark[i].reshape((1,) + X_dark[i].shape)
    aug_iter = datagen.flow(x, batch_size=1)
    for _ in range(3):  # create 3 augmented images per original
        batch = next(aug_iter)
        augmented_images.append(batch[0])
        augmented_labels.append(y_dark[i])

# Combine original and augmented data
X_train_balanced = np.concatenate([X_train, np.array(augmented_images)])
y_train_balanced = np.concatenate([y_train, np.array(augmented_labels)])
skin_train_balanced = np.concatenate([skin_train, np.ones(len(augmented_images))])

# Convert labels to categorical
num_classes = len(np.unique(y))
y_train_cat = to_categorical(y_train_balanced, num_classes)
y_test_cat = to_categorical(y_test, num_classes)

# Compute class weights to balance classes
from sklearn.utils import class_weight
class_weights = class_weight.compute_class_weight('balanced', classes=np.unique(y_train_balanced), y=y_train_balanced)
class_weight_dict = dict(enumerate(class_weights))

# Define simple CNN model
model = Sequential([
    Conv2D(32, (3,3), activation='relu', input_shape=(64,64,3)),
    MaxPooling2D(2,2),
    Conv2D(64, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(num_classes, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train model with class weights
model.fit(X_train_balanced, y_train_cat, epochs=15, batch_size=32, class_weight=class_weight_dict, validation_split=0.1)

# Evaluate overall accuracy
loss, overall_acc = model.evaluate(X_test, y_test_cat, verbose=0)

# Evaluate accuracy by skin tone
from sklearn.metrics import accuracy_score

y_pred_probs = model.predict(X_test)
y_pred = np.argmax(y_pred_probs, axis=1)

acc_lighter = accuracy_score(y_test[skin_test == 0], y_pred[skin_test == 0])
acc_darker = accuracy_score(y_test[skin_test == 1], y_pred[skin_test == 1])

print(f'Overall accuracy: {overall_acc*100:.2f}%')
print(f'Accuracy on lighter skin tones: {acc_lighter*100:.2f}%')
print(f'Accuracy on darker skin tones: {acc_darker*100:.2f}%')
Added data augmentation specifically for darker skin tone images to increase their representation.
Balanced the training dataset by combining original and augmented images.
Used class weighting during training to give more importance to underrepresented classes.
Monitored accuracy separately for lighter and darker skin tones.
Results Interpretation

Before changes:
Overall accuracy: 90%
Accuracy on lighter skin tones: 95%
Accuracy on darker skin tones: 75%

After changes:
Overall accuracy: 88%
Accuracy on lighter skin tones: 90%
Accuracy on darker skin tones: 86%

Balancing data representation and using class weighting can reduce bias and improve fairness in machine learning models, even if overall accuracy slightly decreases.
Bonus Experiment
Try using a fairness-aware loss function or adversarial training to further reduce bias.
💡 Hint
Look into techniques like domain adversarial training or adding fairness constraints during model training.

Practice

(1/5)
1.

What does fairness in face recognition mainly aim to achieve?

easy
A. More complex model architecture
B. Faster processing speed
C. Higher resolution images
D. Equal accuracy for all demographic groups

Solution

  1. Step 1: Understand fairness goal

    Fairness means the model should work equally well for all groups, not just some.
  2. Step 2: Identify fairness metric

    Accuracy or error rates should be similar across different demographic groups.
  3. Final Answer:

    Equal accuracy for all demographic groups -> Option D
  4. Quick Check:

    Fairness = Equal accuracy [OK]
Hint: Fairness means equal results for everyone [OK]
Common Mistakes:
  • Thinking fairness means faster models
  • Confusing fairness with image quality
  • Assuming complex models are always fair
2.

Which of the following is the correct way to check fairness in a face recognition model?

metrics = {'group_A': 0.92, 'group_B': 0.85}
# What should we compare?
easy
A. Only check metrics['group_A']
B. Compare metrics['group_A'] and metrics['group_B'] for equality
C. Ignore metrics and check model size
D. Compare metrics['group_A'] with a random number

Solution

  1. Step 1: Identify fairness check

    Fairness requires comparing performance metrics across groups.
  2. Step 2: Apply comparison

    Compare accuracy or error rates between group_A and group_B to find bias.
  3. Final Answer:

    Compare metrics['group_A'] and metrics['group_B'] for equality -> Option B
  4. Quick Check:

    Fairness check = Compare group metrics [OK]
Hint: Compare group metrics to check fairness [OK]
Common Mistakes:
  • Checking only one group
  • Ignoring metrics and focusing on model size
  • Comparing to unrelated values
3.

Consider this Python code snippet evaluating fairness metrics:

group_accuracies = {'A': 0.90, 'B': 0.75, 'C': 0.88}
threshold = 0.80
biased_groups = [g for g, acc in group_accuracies.items() if acc < threshold]
print(biased_groups)

What is the output?

medium
A. ['B']
B. ['A', 'B']
C. ['C']
D. []

Solution

  1. Step 1: Understand the code logic

    The code collects groups with accuracy less than 0.80 into biased_groups.
  2. Step 2: Check each group's accuracy

    Group A: 0.90 > 0.80 (not biased), B: 0.75 < 0.80 (biased), C: 0.88 > 0.80 (not biased)
  3. Final Answer:

    ['B'] -> Option A
  4. Quick Check:

    Only group B accuracy < threshold [OK]
Hint: Filter groups with accuracy below threshold [OK]
Common Mistakes:
  • Including groups with accuracy above threshold
  • Misreading comparison operator
  • Confusing list comprehension output
4.

Find the error in this fairness evaluation code snippet:

metrics = {'group1': 0.85, 'group2': 0.80}
threshold = 0.82
biased = [g for g, v in metrics if v < threshold]
print(biased)
medium
A. Missing .items() when iterating over dictionary
B. Wrong comparison operator
C. Threshold value is too high
D. Print statement syntax error

Solution

  1. Step 1: Identify dictionary iteration error

    Iterating over a dictionary directly gives keys, not key-value pairs.
  2. Step 2: Fix iteration to use .items()

    Use metrics.items() to get (key, value) pairs for comparison.
  3. Final Answer:

    Missing .items() when iterating over dictionary -> Option A
  4. Quick Check:

    Dictionary iteration needs .items() [OK]
Hint: Use .items() to get key-value pairs from dict [OK]
Common Mistakes:
  • Iterating dict keys instead of items
  • Changing threshold unnecessarily
  • Assuming print syntax is wrong
5.

You have a face recognition model with accuracy 0.95 on group X and 0.70 on group Y. Which approach best improves fairness?

hard
A. Ignore group Y and focus on group X
B. Increase model complexity without changing data
C. Collect more balanced training data including group Y
D. Reduce accuracy on group X to match group Y

Solution

  1. Step 1: Identify fairness problem

    Model performs worse on group Y, showing bias.
  2. Step 2: Choose best fairness improvement

    Balanced data helps model learn features for all groups equally.
  3. Step 3: Evaluate other options

    Increasing complexity alone may not fix bias; ignoring group Y is unfair; reducing group X accuracy is not ideal.
  4. Final Answer:

    Collect more balanced training data including group Y -> Option C
  5. Quick Check:

    Balanced data improves fairness [OK]
Hint: Balance training data to reduce bias [OK]
Common Mistakes:
  • Thinking model complexity fixes bias alone
  • Ignoring underperforming groups
  • Lowering accuracy on better groups