0
0
NLPml~20 mins

Bias and fairness in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Bias and Fairness Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding Bias Types in NLP

Which of the following best describes representation bias in natural language processing?

ABias caused by errors in tokenization or text preprocessing steps.
BBias introduced by the model architecture that favors certain words over others.
CBias caused by unbalanced or skewed training data that underrepresents certain groups or topics.
DBias that occurs only during model deployment due to hardware limitations.
Attempts:
2 left
💡 Hint

Think about how the data itself might not fairly represent all groups or topics.

Metrics
intermediate
2:00remaining
Measuring Fairness with Equality of Opportunity

In a binary classification NLP task, which metric best captures equality of opportunity between two demographic groups?

ADifference in true positive rates (TPR) between the groups.
BDifference in overall accuracy between the groups.
CDifference in false positive rates (FPR) between the groups.
DDifference in precision between the groups.
Attempts:
2 left
💡 Hint

Equality of opportunity focuses on equal chances to correctly identify positive cases.

Predict Output
advanced
3:00remaining
Output of Bias Mitigation Code Snippet

What is the output of the following Python code that applies simple bias mitigation by equalizing sample counts?

NLP
from collections import Counter

data = [('male', 'positive'), ('female', 'positive'), ('male', 'negative'), ('male', 'positive'), ('female', 'negative')]

# Count samples per gender
counts = Counter([d[0] for d in data])
min_count = min(counts.values())

# Equalize samples by truncating
balanced_data = []
counts_seen = {'male': 0, 'female': 0}
for d in data:
    gender = d[0]
    if counts_seen[gender] < min_count:
        balanced_data.append(d)
        counts_seen[gender] += 1

print(balanced_data)
A[('male', 'positive'), ('female', 'positive'), ('male', 'negative')]
B[('female', 'positive'), ('female', 'negative')]
C[('male', 'positive'), ('male', 'negative')]
D[('male', 'positive'), ('female', 'positive'), ('male', 'negative'), ('female', 'negative')]
Attempts:
2 left
💡 Hint

Check how many samples each gender has and how many are kept after truncation.

Model Choice
advanced
2:30remaining
Choosing a Model Architecture to Reduce Gender Bias

Which model architecture is best suited to reduce gender bias in a sentiment analysis task on social media text?

AA simple logistic regression model trained on raw word counts.
BA transformer model with adversarial training to remove gender information from embeddings.
CA convolutional neural network without any bias mitigation techniques.
DA recurrent neural network trained only on male-authored texts.
Attempts:
2 left
💡 Hint

Consider architectures that explicitly try to remove bias signals.

🔧 Debug
expert
3:00remaining
Debugging Fairness Metric Calculation

Given this code snippet to compute demographic parity difference, what error or issue will it cause?

NLP
def demographic_parity_difference(preds, groups):
    # preds: list of 0/1 predictions
    # groups: list of group labels (e.g., 'A', 'B')
    group_pos_rates = {}
    for g in set(groups):
        group_preds = [p for p, grp in zip(preds, groups) if grp == g]
        group_pos_rates[g] = sum(group_preds) / len(group_preds)
    return abs(group_pos_rates['A'] - group_pos_rates['B'])

# Example usage
preds = [1, 0, 1, 1, 0]
groups = ['A', 'A', 'B', 'B', 'B']
print(demographic_parity_difference(preds, groups))
ANo error; outputs 0.16666666666666663.
BZeroDivisionError if any group has zero members.
CKeyError if groups contain labels other than 'A' or 'B'.
DTypeError because sum is used on non-numeric data.
Attempts:
2 left
💡 Hint

Check the inputs and how the function handles group labels.