Bird
Raised Fist0
Prompt Engineering / GenAIml~20 mins

Bias in generative models in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Bias in generative models
Problem:You have a generative AI model that creates text based on prompts. The model tends to produce biased or stereotypical outputs for certain groups, which is unfair and can cause harm.
Current Metrics:Bias score measured by a fairness metric is 0.35 (on a scale where 0 means no bias and 1 means high bias). The model generates text with biased language in 35% of tested samples.
Issue:The model shows significant bias in generated text, producing unfair stereotypes and unbalanced representations.
Your Task
Reduce the bias score from 0.35 to below 0.15 while maintaining the quality of generated text.
You cannot reduce the size of the training data drastically.
You must keep the model architecture the same.
You can only adjust training methods or add bias mitigation techniques.
Hint 1
Hint 2
Hint 3
Solution
Prompt Engineering / GenAI
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel, Trainer, TrainingArguments

# Load pretrained model and tokenizer
model_name = 'gpt2'
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

# Assume we have a balanced fine-tuning dataset 'balanced_dataset' prepared to reduce bias
# This dataset contains prompts and unbiased target texts

# Define a custom loss function to penalize biased outputs (simplified example)
class BiasMitigationTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        outputs = model(**inputs)
        logits = outputs.logits
        labels = inputs['labels']
        loss_fct = torch.nn.CrossEntropyLoss()
        base_loss = loss_fct(logits.view(-1, logits.size(-1)), labels.view(-1))
        # Dummy bias penalty: add small penalty if certain biased tokens appear (example)
        bias_tokens = [tokenizer.encode(word, add_special_tokens=False)[0] for word in ['stereotype1', 'stereotype2']]
        bias_penalty = 0
        for token in bias_tokens:
            bias_penalty += (logits[:, :, token].mean())
        total_loss = base_loss + 0.1 * bias_penalty
        return (total_loss, outputs) if return_outputs else total_loss

training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    evaluation_strategy='epoch',
    save_strategy='epoch',
    logging_dir='./logs',
    logging_steps=10,
    learning_rate=5e-5
)

trainer = BiasMitigationTrainer(
    model=model,
    args=training_args,
    train_dataset=balanced_dataset,
    eval_dataset=balanced_eval_dataset
)

trainer.train()
Fine-tuned the pretrained generative model on a balanced dataset to reduce bias.
Added a custom loss penalty to discourage biased token generation.
Kept the original model architecture unchanged.
Used training arguments with moderate learning rate and batch size for stable fine-tuning.
Results Interpretation

Before: Bias score = 0.35, biased outputs in 35% of samples.

After: Bias score = 0.12, biased outputs in 12% of samples.

Fine-tuning a generative model on balanced data and adding bias penalties during training can effectively reduce bias without changing the model structure.
Bonus Experiment
Try using adversarial training where a discriminator detects bias and the generator learns to avoid it.
💡 Hint
Implement a two-model setup where the discriminator guides the generator to produce unbiased text by providing feedback during training.

Practice

(1/5)
1. What is the main cause of bias in generative AI models?
easy
A. The speed of the computer
B. The programming language used
C. The data used to train the model
D. The color of the user interface

Solution

  1. Step 1: Understand what bias means in generative models

    Bias means the model gives unfair or unbalanced results.
  2. Step 2: Identify the source of bias

    Bias mainly comes from the data used to train the model, as it reflects existing patterns or prejudices.
  3. Final Answer:

    The data used to train the model -> Option C
  4. Quick Check:

    Bias source = training data [OK]
Hint: Bias mostly comes from training data, not code or hardware [OK]
Common Mistakes:
  • Thinking bias comes from programming language
  • Blaming hardware speed for bias
  • Confusing UI design with bias
2. Which of the following is the correct way to describe bias in generative models?
easy
A. Bias means the model produces unfair or unbalanced outputs
B. Bias means the model always predicts correctly
C. Bias means the model runs faster on some computers
D. Bias means the model uses more memory

Solution

  1. Step 1: Define bias in the context of generative models

    Bias refers to unfair or unbalanced outputs, not performance or resource use.
  2. Step 2: Match the correct description

    Bias means the model produces unfair or unbalanced outputs correctly states bias as unfair or unbalanced outputs.
  3. Final Answer:

    Bias means the model produces unfair or unbalanced outputs -> Option A
  4. Quick Check:

    Bias = unfair outputs [OK]
Hint: Bias is about fairness in output, not speed or memory [OK]
Common Mistakes:
  • Confusing bias with model accuracy
  • Mixing bias with hardware performance
  • Thinking bias relates to memory use
3. Consider a generative model trained on text data mostly from one culture. What is likely to happen when it generates stories about other cultures?
medium
A. It may produce biased or stereotyped stories about other cultures
B. It will generate perfectly balanced stories about all cultures
C. It will refuse to generate any story about other cultures
D. It will generate stories faster for other cultures

Solution

  1. Step 1: Understand training data influence

    The model learns patterns from its training data, so if data is mostly from one culture, it lacks diversity.
  2. Step 2: Predict output behavior

    When asked about other cultures, the model may produce biased or stereotyped stories due to limited or skewed data.
  3. Final Answer:

    It may produce biased or stereotyped stories about other cultures -> Option A
  4. Quick Check:

    Limited data causes biased outputs [OK]
Hint: Limited data diversity causes biased outputs [OK]
Common Mistakes:
  • Assuming model is unbiased regardless of data
  • Thinking model refuses to generate unknown topics
  • Confusing speed with bias
4. You notice your generative model outputs biased text favoring one gender. Which step can help fix this issue?
medium
A. Use a smaller batch size during training
B. Increase the model's learning rate
C. Reduce the number of training epochs
D. Add more balanced and diverse training data

Solution

  1. Step 1: Identify cause of bias

    Bias often comes from unbalanced training data that favors one group.
  2. Step 2: Choose corrective action

    Adding more balanced and diverse data helps the model learn fairer patterns and reduce bias.
  3. Final Answer:

    Add more balanced and diverse training data -> Option D
  4. Quick Check:

    Balanced data reduces bias [OK]
Hint: Fix bias by improving training data diversity [OK]
Common Mistakes:
  • Changing learning rate without addressing data
  • Adjusting batch size unrelated to bias
  • Reducing epochs without fixing data
5. A company wants to reduce bias in its generative model that creates job descriptions. Which combined approach is best to improve fairness?
hard
A. Remove all rare words from the training data
B. Use diverse training data and add fairness constraints during model training
C. Train the model faster with fewer epochs
D. Only increase the model size without changing data

Solution

  1. Step 1: Understand bias reduction methods

    Bias can be reduced by improving data diversity and applying fairness rules during training.
  2. Step 2: Evaluate options

    Use diverse training data and add fairness constraints during model training combines better data and fairness constraints, which is more effective than just changing model size or training speed.
  3. Final Answer:

    Use diverse training data and add fairness constraints during model training -> Option B
  4. Quick Check:

    Data + fairness constraints = less bias [OK]
Hint: Combine diverse data with fairness rules for best bias fix [OK]
Common Mistakes:
  • Thinking bigger model alone fixes bias
  • Speeding training reduces bias (it doesn't)
  • Removing rare words harms data diversity