Computer Visionml~20 mins

Fairness in face recognition in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Fairness in face recognition

Problem:We want to build a face recognition model that works fairly well for all skin tones. Currently, the model performs well on lighter skin tones but poorly on darker skin tones.

Current Metrics:Overall accuracy: 90%, Accuracy on lighter skin tones: 95%, Accuracy on darker skin tones: 75%

Issue:The model is biased and overfits to lighter skin tones, causing unfair performance gaps.

Your Task

Reduce the accuracy gap between lighter and darker skin tones to less than 5%, while keeping overall accuracy above 85%.

You can only modify the training process and data handling.

Do not change the model architecture.

Keep training time reasonable (under 30 minutes).

Hint 1

Hint 2

Hint 3

Solution

Computer Vision

import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Simulated dataset loading function
# X: images, y: labels, skin_tones: 0 for lighter, 1 for darker
X = np.load('face_images.npy')  # shape (num_samples, 64, 64, 3)
y = np.load('face_labels.npy')  # shape (num_samples,)
skin_tones = np.load('skin_tones.npy')  # shape (num_samples,)

# Split data
X_train, X_test, y_train, y_test, skin_train, skin_test = train_test_split(
    X, y, skin_tones, test_size=0.2, random_state=42, stratify=y)

# Data augmentation for darker skin tones
datagen = ImageDataGenerator(
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True
)

# Separate darker skin tone samples
dark_indices = np.where(skin_train == 1)[0]
X_dark = X_train[dark_indices]
y_dark = y_train[dark_indices]

# Augment darker skin tone images to balance dataset
augmented_images = []
augmented_labels = []
for i in range(len(X_dark)):
    x = X_dark[i].reshape((1,) + X_dark[i].shape)
    aug_iter = datagen.flow(x, batch_size=1)
    for _ in range(3):  # create 3 augmented images per original
        batch = next(aug_iter)
        augmented_images.append(batch[0])
        augmented_labels.append(y_dark[i])

# Combine original and augmented data
X_train_balanced = np.concatenate([X_train, np.array(augmented_images)])
y_train_balanced = np.concatenate([y_train, np.array(augmented_labels)])
skin_train_balanced = np.concatenate([skin_train, np.ones(len(augmented_images))])

# Convert labels to categorical
num_classes = len(np.unique(y))
y_train_cat = to_categorical(y_train_balanced, num_classes)
y_test_cat = to_categorical(y_test, num_classes)

# Compute class weights to balance classes
from sklearn.utils import class_weight
class_weights = class_weight.compute_class_weight('balanced', classes=np.unique(y_train_balanced), y=y_train_balanced)
class_weight_dict = dict(enumerate(class_weights))

# Define simple CNN model
model = Sequential([
    Conv2D(32, (3,3), activation='relu', input_shape=(64,64,3)),
    MaxPooling2D(2,2),
    Conv2D(64, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(num_classes, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train model with class weights
model.fit(X_train_balanced, y_train_cat, epochs=15, batch_size=32, class_weight=class_weight_dict, validation_split=0.1)

# Evaluate overall accuracy
loss, overall_acc = model.evaluate(X_test, y_test_cat, verbose=0)

# Evaluate accuracy by skin tone
from sklearn.metrics import accuracy_score

y_pred_probs = model.predict(X_test)
y_pred = np.argmax(y_pred_probs, axis=1)

acc_lighter = accuracy_score(y_test[skin_test == 0], y_pred[skin_test == 0])
acc_darker = accuracy_score(y_test[skin_test == 1], y_pred[skin_test == 1])

print(f'Overall accuracy: {overall_acc*100:.2f}%')
print(f'Accuracy on lighter skin tones: {acc_lighter*100:.2f}%')
print(f'Accuracy on darker skin tones: {acc_darker*100:.2f}%')

Added data augmentation specifically for darker skin tone images to increase their representation.

Balanced the training dataset by combining original and augmented images.

Used class weighting during training to give more importance to underrepresented classes.

Monitored accuracy separately for lighter and darker skin tones.

Results Interpretation

Before changes:
Overall accuracy: 90%
Accuracy on lighter skin tones: 95%
Accuracy on darker skin tones: 75%

After changes:
Overall accuracy: 88%
Accuracy on lighter skin tones: 90%
Accuracy on darker skin tones: 86%

Balancing data representation and using class weighting can reduce bias and improve fairness in machine learning models, even if overall accuracy slightly decreases.

Bonus Experiment

Try using a fairness-aware loss function or adversarial training to further reduce bias.

💡 Hint

Look into techniques like domain adversarial training or adding fairness constraints during model training.

Practice

(1/5)

What does fairness in face recognition mainly aim to achieve?

easy

A. More complex model architecture

B. Faster processing speed

C. Higher resolution images

D. Equal accuracy for all demographic groups

Which of the following is the correct way to check fairness in a face recognition model?

metrics = {'group_A': 0.92, 'group_B': 0.85}
# What should we compare?

easy

A. Only check metrics['group_A']

B. Compare metrics['group_A'] and metrics['group_B'] for equality

C. Ignore metrics and check model size

D. Compare metrics['group_A'] with a random number

Consider this Python code snippet evaluating fairness metrics:

group_accuracies = {'A': 0.90, 'B': 0.75, 'C': 0.88}
threshold = 0.80
biased_groups = [g for g, acc in group_accuracies.items() if acc < threshold]
print(biased_groups)

What is the output?

medium

A. ['B']

B. ['A', 'B']

C. ['C']

D. []

Find the error in this fairness evaluation code snippet:

metrics = {'group1': 0.85, 'group2': 0.80}
threshold = 0.82
biased = [g for g, v in metrics if v < threshold]
print(biased)

medium

A. Missing .items() when iterating over dictionary

B. Wrong comparison operator

C. Threshold value is too high

D. Print statement syntax error

Fairness in face recognition in Computer Vision - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand fairness goal

Step 2: Identify fairness metric

Final Answer:

Quick Check:

Solution

Step 1: Identify fairness check

Step 2: Apply comparison

Final Answer:

Quick Check:

Solution

Step 1: Understand the code logic

Step 2: Check each group's accuracy

Final Answer:

Quick Check:

Solution

Step 1: Identify dictionary iteration error

Step 2: Fix iteration to use .items()

Final Answer:

Quick Check:

Solution

Step 1: Identify fairness problem

Step 2: Choose best fairness improvement

Step 3: Evaluate other options

Final Answer:

Quick Check: