Experiment - Content filtering
Problem:You have a text generation model that sometimes produces inappropriate or harmful content. The goal is to filter out such content to keep outputs safe and friendly.
Current Metrics:Current model generates 10% inappropriate content in outputs based on manual review.
Issue:The model lacks a content filtering mechanism, causing unsafe outputs that reduce user trust.