For input validation and sanitization, the key metrics are False Positive Rate and False Negative Rate. These show how often bad inputs are wrongly accepted or good inputs are wrongly rejected. Minimizing false negatives is critical to avoid security risks, while minimizing false positives keeps the system user-friendly.
Input validation and sanitization in Agentic Ai - Model Metrics & Evaluation
| Predicted Valid | Predicted Invalid |
|-----------------|-------------------|
| True Valid (TV) | False Invalid (FI) |
| False Valid (FV) | True Invalid (TI) |
Total inputs = TV + FI + FV + TI
- True Valid (TV): Correctly accepted good inputs
- False Invalid (FI): Good inputs wrongly rejected
- False Valid (FV): Bad inputs wrongly accepted
- True Invalid (TI): Correctly rejected bad inputs
Precision here means how many accepted inputs are actually good. High precision means few bad inputs get through.
Recall means how many good inputs are accepted out of all good inputs. High recall means few good inputs are wrongly blocked.
Example: If you block too many inputs to be safe, recall drops (good inputs rejected). If you accept too many inputs, precision drops (bad inputs accepted).
Balance depends on use case: For security, prioritize precision (block bad inputs). For user experience, prioritize recall (accept good inputs).
- Good: Precision > 0.95 and Recall > 0.90 means most bad inputs blocked and most good inputs accepted.
- Bad: Precision < 0.70 means many bad inputs get through, risking security.
- Bad: Recall < 0.70 means many good inputs are blocked, frustrating users.
- Accuracy paradox: If bad inputs are rare, high accuracy can hide poor detection of bad inputs.
- Data leakage: Using test inputs that are too similar to training can inflate metrics falsely.
- Overfitting: Model may block only known bad inputs but fail on new types.
- Ignoring user impact: High false invalid rate frustrates users even if security is strong.
Your input validation model has 98% accuracy but only 12% recall on good inputs. Is it good for production? Why or why not?
Answer: No, it is not good. The low recall means most good inputs are wrongly blocked, causing poor user experience despite high accuracy.
