0
0
Prompt Engineering / GenAIml~12 mins

Bias in generative models in Prompt Engineering / GenAI - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - Bias in generative models

This pipeline shows how bias can enter and affect a generative AI model. It starts with data collection, moves through preprocessing and training, and ends with biased or fair generated outputs.

Data Flow - 5 Stages
1Data Collection
10000 text samplesGather text data from internet sources10000 text samples
"The doctor said...", "She is a nurse...", "He is a programmer..."
2Preprocessing
10000 text samplesClean text, remove duplicates, tokenize10000 tokenized samples
[['The', 'doctor', 'said'], ['She', 'is', 'a', 'nurse']]
3Feature Engineering
10000 tokenized samplesConvert tokens to embeddings10000 samples x 300 embedding dimensions
[[0.12, -0.05, ..., 0.33], [0.07, 0.11, ..., -0.02]]
4Model Training
10000 samples x 300 embedding dimensionsTrain generative model to predict next tokensTrained generative model
Model learns patterns like 'doctor' often followed by 'he' or 'she'
5Generation
Prompt textGenerate new text based on learned patternsGenerated text
"The nurse said she will help you."
Training Trace - Epoch by Epoch
Loss
2.3 |****
2.0 |*** 
1.7 |**  
1.4 |*   
1.1 |****
     Epochs -> 1 3 5 7
EpochLoss ↓Accuracy ↑Observation
12.30.25Model starts learning basic language patterns
31.80.40Model improves but still biased towards frequent patterns
51.40.55Model captures more complex patterns, bias remains
71.10.65Model converges, bias in data reflected in outputs
Prediction Trace - 4 Layers
Layer 1: Input Prompt
Layer 2: Embedding Layer
Layer 3: Generative Model Prediction
Layer 4: Output Generation
Model Quiz - 3 Questions
Test your understanding
At which stage can bias first enter the generative model pipeline?
AModel Training
BGeneration
CData Collection
DPrediction
Key Insight
Bias in generative models often comes from the data they learn from. Even if the model learns well (loss decreases), it can reproduce biases present in the training data. Understanding each pipeline stage helps identify and reduce bias.