Experiment - Red teaming and adversarial testing
Problem:You have a text classification AI model that performs well on normal inputs but may fail when given tricky or misleading inputs designed to confuse it.
Current Metrics:Training accuracy: 95%, Validation accuracy: 90%, Adversarial test accuracy: 60%
Issue:The model is vulnerable to adversarial inputs, causing a large drop in accuracy on these tricky examples.