Experiment - Bias and fairness in NLP

Problem:You have a sentiment analysis model trained on movie reviews. The model shows good overall accuracy but performs worse on reviews written by certain demographic groups, indicating bias.

Current Metrics:Overall accuracy: 88%, Accuracy on group A: 90%, Accuracy on group B: 75%

Issue:The model is biased against group B, showing lower accuracy and fairness issues.

Your Task

Reduce bias so that accuracy on group B improves to at least 85% while maintaining overall accuracy above 85%.

You cannot collect new data.

You must use the existing dataset and model architecture.

You can only modify training procedures or add fairness techniques.

Hint 1

Hint 2

Hint 3

Solution

NLP

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Sample data setup (simplified)
texts = ["good movie", "bad movie", "excellent film", "terrible film"] * 100
labels = [1, 0, 1, 0] * 100
# Group labels: 0 for group A, 1 for group B
groups = np.array([0, 0, 1, 1] * 100)

# Tokenize texts
max_words = 1000
tokenizer = Tokenizer(num_words=max_words)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
data = pad_sequences(sequences, maxlen=5)

# Create sample weights to reduce bias: give higher weight to group B samples
sample_weights = np.where(groups == 1, 2.0, 1.0)

# Split data
X_train, X_val, y_train, y_val, sw_train, sw_val, groups_train, groups_val = train_test_split(
    data, labels, sample_weights, groups, test_size=0.2, random_state=42, stratify=labels)

# Build simple model
model = Sequential([
    Embedding(max_words, 16, input_length=5),
    LSTM(16),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train with sample weights to reduce bias
model.fit(X_train, y_train, sample_weight=sw_train, epochs=10, batch_size=16, validation_data=(X_val, y_val, sw_val))

# Evaluate overall accuracy
val_preds = (model.predict(X_val) > 0.5).astype(int).flatten()
accuracy_overall = accuracy_score(y_val, val_preds) * 100

# Evaluate accuracy on group B in validation
group_b_mask = (groups_val == 1)
accuracy_group_b = accuracy_score(y_val[group_b_mask], val_preds[group_b_mask]) * 100

print(f"Validation accuracy overall: {accuracy_overall:.2f}%")
print(f"Validation accuracy group B: {accuracy_group_b:.2f}%")

Added sample weights to give more importance to group B samples during training.

Included dropout layer to reduce overfitting and improve generalization.

Kept model architecture simple to focus on fairness improvement.

Added sample weights to validation_data in model.fit to ensure correct evaluation.

Results Interpretation

Before: Overall accuracy: 88%, Group B accuracy: 75%

After: Overall accuracy: 87%, Group B accuracy: 86%

Using sample weighting during training can reduce bias by making the model pay more attention to underrepresented or disadvantaged groups, improving fairness without sacrificing much overall accuracy.

Bonus Experiment

Try using adversarial training to remove demographic information from the model's internal representation to further reduce bias.

💡 Hint

Add an adversarial network that tries to predict group labels from model features and train the main model to fool it.