Experiment - Output filtering and safety checks
Problem:You have an AI agent that generates text outputs. Sometimes, the outputs contain unsafe or inappropriate content. This can cause harm or violate usage policies.
Current Metrics:Safety violation rate: 15% of outputs contain unsafe content. User satisfaction: 70%.
Issue:The AI agent produces unsafe outputs too often, reducing trust and usability.
