0
0
Prompt Engineering / GenAIml~8 mins

Prompt injection defense in Prompt Engineering / GenAI - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Prompt injection defense
Which metric matters for prompt injection defense and WHY

For prompt injection defense, the key metrics are Precision and Recall on detecting malicious prompts.

Recall is crucial because we want to catch as many harmful injections as possible to keep the AI safe.

Precision is also important to avoid blocking good user prompts by mistake, which would hurt user experience.

Balancing these two helps ensure the defense is effective without being too strict or too loose.

Confusion matrix for prompt injection detection
      | Predicted Injection | Predicted Safe |
      |---------------------|----------------|
      | True Positive (TP)   | False Positive (FP) |
      | False Negative (FN)  | True Negative (TN)  |

      Example with 100 prompts:
      TP = 40 (correctly caught injections)
      FP = 10 (safe prompts wrongly blocked)
      TN = 45 (safe prompts correctly allowed)
      FN = 5  (injections missed)

      Total = 40 + 10 + 45 + 5 = 100
    
Precision vs Recall tradeoff with examples

If the defense is very strict, it catches almost all injections (high recall) but blocks many safe prompts (low precision). This frustrates users.

If the defense is too loose, it lets many injections through (low recall) but rarely blocks safe prompts (high precision). This risks AI misuse.

Example:

  • High recall, low precision: 95% injections caught, but 30% safe prompts blocked.
  • High precision, low recall: 80% safe prompts allowed, but only 60% injections caught.

We want a balance that keeps the AI safe and users happy.

What good vs bad metric values look like

Good defense:

  • Recall above 90% (most injections caught)
  • Precision above 85% (few safe prompts blocked)
  • Balanced F1 score above 0.87

Bad defense:

  • Recall below 70% (many injections missed)
  • Precision below 60% (many false blocks)
  • Low F1 score below 0.65
Common pitfalls in metrics for prompt injection defense
  • Accuracy paradox: If injections are rare, a model that always says "safe" can have high accuracy but zero recall.
  • Data leakage: Testing on prompts seen during training inflates metrics falsely.
  • Overfitting: Defense works well on test data but fails on new injection types.
  • Ignoring recall: Missing injections can cause serious harm, so recall must not be overlooked.
Self-check question

Your prompt injection defense model has 98% accuracy but only 12% recall on injection prompts. Is it good for production? Why or why not?

Answer: No, it is not good. The low recall means it misses 88% of harmful injections, which can let attacks through. High accuracy is misleading here because injections are rare. The defense must catch most injections to be effective.

Key Result
High recall and precision are essential to effectively detect prompt injections while minimizing false blocks.