Content filtering models decide if content is safe or harmful. The key metrics are Precision and Recall. Precision tells us how many flagged contents are truly harmful. Recall tells us how many harmful contents were caught. High recall is important to catch all bad content, but high precision avoids wrongly blocking good content. Balancing both is critical.
Content filtering in Prompt Engineering / GenAI - Model Metrics & Evaluation
| Predicted Harmful | Predicted Safe |
|-------------------|----------------|
| True Positive (TP) | False Positive (FP) |
| False Negative (FN)| True Negative (TN) |
Example:
TP = 80 (harmful content caught)
FP = 20 (safe content wrongly blocked)
TN = 900 (safe content allowed)
FN = 10 (harmful content missed)
Total samples = 80 + 20 + 900 + 10 = 1010
If the model blocks too much content (high recall), it may block good posts (low precision). This annoys users. If it blocks too little (high precision), harmful content slips through (low recall). For example, a social media platform wants to catch all hate speech (high recall) but also avoid blocking normal posts (high precision). The right balance depends on the platform's goals.
Good: Precision around 0.9 and Recall around 0.85 means most harmful content is caught and few good posts are blocked.
Bad: Precision 0.5 and Recall 0.95 means many good posts are wrongly blocked. Or Precision 0.95 and Recall 0.4 means many harmful posts are missed.
- Accuracy paradox: If harmful content is rare, a model that always predicts safe can have high accuracy but is useless.
- Data leakage: If test data leaks info from training, metrics look better than real.
- Overfitting: Very high training metrics but poor real-world performance.
Your content filter model has 98% accuracy but only 12% recall on harmful content. Is it good for production? Why or why not?
Answer: No, it is not good. The model misses 88% of harmful content (low recall), which is dangerous. High accuracy is misleading because harmful content is rare. Improving recall is critical.