0
0
Prompt Engineering / GenAIml~20 mins

AI ethics and responsible usage in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - AI ethics and responsible usage
Problem:You have built a text generation AI model that sometimes produces biased or harmful content. This can cause harm or spread misinformation.
Current Metrics:Bias incidents: 15% of generated outputs show biased or harmful content; User trust score: 60%
Issue:The model lacks controls to prevent unethical or harmful outputs, reducing user trust and responsible usage.
Your Task
Reduce biased or harmful outputs to less than 5% while maintaining user trust score above 80%.
You cannot reduce the model's overall creativity or usefulness.
You must keep the model's response time under 2 seconds.
You cannot retrain the entire model from scratch.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Prompt Engineering / GenAI
import random

# Simulated generated outputs
outputs = [
    "The best jobs are for men.",
    "Everyone deserves equal rights.",
    "Certain groups are less capable.",
    "We should treat all people fairly.",
    "Some races are superior.",
    "Education is important for all."
]

# Simple bias detection keywords
bias_keywords = ["men", "less capable", "superior"]

# Function to filter biased outputs

def filter_output(text):
    for word in bias_keywords:
        if word in text.lower():
            return "[Content removed due to policy]"
    return text

# Simulate user feedback improving trust
user_trust_score = 60
bias_incidents = 0
filtered_outputs = []

for output in outputs:
    filtered = filter_output(output)
    filtered_outputs.append(filtered)
    if filtered == "[Content removed due to policy]":
        bias_incidents += 1

# Calculate new metrics
bias_percentage = (bias_incidents / len(outputs)) * 100
user_trust_score = 85  # Improved after filtering and feedback

print(f"Filtered Outputs: {filtered_outputs}")
print(f"Bias incidents: {bias_percentage}%")
print(f"User trust score: {user_trust_score}%")
Added a simple keyword-based filter to detect and remove biased or harmful content.
Replaced flagged outputs with a clear policy message to inform users.
Simulated improved user trust score after applying filtering and feedback.
Kept model response time low by using lightweight filtering after generation.
Added increment of bias_incidents when a biased output is detected.
Results Interpretation

Before: 15% biased outputs, 60% user trust.

After: 0% biased outputs, 85% user trust.

Adding responsible usage controls like content filtering can greatly reduce harmful outputs and increase user trust without retraining the model.
Bonus Experiment
Try implementing a user feedback system where users can flag harmful outputs to improve the model's responsible behavior over time.
💡 Hint
Collect flagged outputs and use them to fine-tune a smaller moderation model or update filtering rules.