For AI safety and misuse prevention, key metrics include False Positive Rate and False Negative Rate. False positives mean the system wrongly blocks safe actions, causing inconvenience. False negatives mean the system misses harmful actions, allowing misuse. Balancing these helps keep AI safe and useful.
Why AI safety prevents misuse in Prompt Engineering / GenAI - Why Metrics Matter
Start learning this pattern below
Jump into concepts and practice - no test required
| Predicted Safe | Predicted Unsafe |
|----------------|------------------|
| True Safe (TN) | False Unsafe (FP) |
| False Safe (FN)| True Unsafe (TP) |
Total samples = TP + FP + TN + FN
Imagine an AI that blocks harmful content online.
- High Precision: Most blocked content is truly harmful. Few safe posts are blocked. Good for user trust.
- High Recall: Most harmful content is caught. Few harmful posts slip through. Good for safety.
Too high precision but low recall means some harmful content gets missed. Too high recall but low precision means many safe posts get wrongly blocked. AI safety aims to find the right balance.
Good: Precision and recall both above 0.8 means AI blocks most harmful content and rarely blocks safe content.
Bad: Precision below 0.5 means many safe actions blocked (annoying users). Recall below 0.5 means many harmful actions missed (unsafe).
- Accuracy paradox: If harmful actions are rare, high accuracy can hide poor detection.
- Data leakage: If test data leaks info from training, metrics look better than real.
- Overfitting: Model works well on training but fails on new misuse types.
Your AI safety model has 98% accuracy but only 12% recall on harmful misuse. Is it good for production?
Answer: No. It misses 88% of harmful misuse, which is unsafe. High accuracy is misleading because harmful cases are rare. Improving recall is critical.
Practice
Solution
Step 1: Understand the purpose of AI safety
AI safety focuses on preventing harmful effects from AI systems.Step 2: Compare options to the purpose
Only preventing harm matches the main goal of AI safety.Final Answer:
It helps prevent AI from causing harm to people. -> Option AQuick Check:
AI safety = prevent harm [OK]
- Confusing safety with performance improvements
- Thinking safety means AI is always correct
- Assuming safety increases cost only
Solution
Step 1: Identify AI safety rules
AI safety includes rules like fairness, transparency, and privacy.Step 2: Match options to safety rules
Only respecting user privacy fits as a safety rule.Final Answer:
Ensure AI respects user privacy. -> Option DQuick Check:
Privacy rule = Ensure AI respects user privacy. [OK]
- Choosing options that ignore fairness or transparency
- Confusing speed or secrecy with safety
- Ignoring user rights in AI use
def check_safety(data):
if 'private_info' in data:
return False
return True
result = check_safety({'name': 'Alice', 'private_info': 'secret'})
print(result)
What will be the output?Solution
Step 1: Analyze the function check_safety
The function returns False if 'private_info' is in the data dictionary.Step 2: Check the input dictionary
The input contains 'private_info', so the function returns False.Final Answer:
False -> Option CQuick Check:
Contains 'private_info' = False [OK]
- Assuming function returns True always
- Confusing key presence check logic
- Expecting runtime error due to dictionary
banned_words = ['hack', 'steal', 'attack']
def is_safe(text):
for word in banned_words:
if word in text:
return False
return True
print(is_safe('Try to Hack the system'))Solution
Step 1: Understand the function behavior
The function checks if any banned word is in the text exactly as is.Step 2: Identify case sensitivity issue
The input text has 'Hack' with uppercase H, but banned_words are lowercase, so 'hack' not found.Final Answer:
The check is case-sensitive and misses 'Hack'. -> Option AQuick Check:
Case sensitivity causes miss = The check is case-sensitive and misses 'Hack'. [OK]
- Assuming banned_words is empty
- Thinking function always returns True
- Ignoring case differences in text
Solution
Step 1: Evaluate each approach for safety
Ignoring input (A) or disabling AI (D) removes usefulness; secret logging (C) lacks transparency.Step 2: Identify best combined approach
Transparency and fairness (B) are core AI safety principles to explain decisions and avoid bias.Final Answer:
Use transparency to explain AI decisions and apply fairness to avoid bias. -> Option BQuick Check:
AI safety = transparency + fairness [OK]
- Thinking ignoring input is safe
- Assuming disabling AI is practical
- Ignoring transparency importance
