Imagine you have a smart assistant that can do many tasks. Why do we need AI safety rules to stop it from being used in harmful ways?
Think about how rules help keep things safe in real life, like traffic laws.
AI safety focuses on making sure AI systems act in ways that are ethical and do not cause harm. This helps prevent misuse by bad actors or accidental damage.
You want to build an AI chatbot that avoids giving harmful advice. Which model choice helps reduce misuse risk?
Think about how human feedback can teach AI to be safer.
Training with human feedback helps the AI learn to avoid harmful or unsafe responses, reducing misuse risk.
You want to check if your AI system is safe and not misused. Which metric best measures this?
Think about how to detect harmful outputs from the AI.
Tracking how often AI outputs are flagged as harmful helps measure how well safety measures prevent misuse.
Given this code snippet that filters harmful AI outputs, which option explains why harmful content still appears?
def filter_output(text):
harmful_words = ['hack', 'attack', 'steal']
for word in harmful_words:
if word in text:
return 'Content blocked due to safety.'
return text
output = filter_output(ai_response)Think about how text matching works with uppercase and lowercase letters.
The filter checks for exact matches with lowercase words, so it misses harmful words with uppercase letters, allowing misuse.
You train a language model and want to reduce the chance it generates harmful content. Which hyperparameter adjustment helps most?
Think about how randomness affects the safety of generated text.
Lowering temperature reduces randomness, making the model less likely to produce unexpected or harmful outputs.