Bird
Raised Fist0
Prompt Engineering / GenAIml~8 mins

Prompt injection defense in Prompt Engineering / GenAI - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Prompt injection defense
Which metric matters for prompt injection defense and WHY

For prompt injection defense, the key metrics are Precision and Recall on detecting malicious prompts.

Recall is crucial because we want to catch as many harmful injections as possible to keep the AI safe.

Precision is also important to avoid blocking good user prompts by mistake, which would hurt user experience.

Balancing these two helps ensure the defense is effective without being too strict or too loose.

Confusion matrix for prompt injection detection
      | Predicted Injection | Predicted Safe |
      |---------------------|----------------|
      | True Positive (TP)   | False Positive (FP) |
      | False Negative (FN)  | True Negative (TN)  |

      Example with 100 prompts:
      TP = 40 (correctly caught injections)
      FP = 10 (safe prompts wrongly blocked)
      TN = 45 (safe prompts correctly allowed)
      FN = 5  (injections missed)

      Total = 40 + 10 + 45 + 5 = 100
    
Precision vs Recall tradeoff with examples

If the defense is very strict, it catches almost all injections (high recall) but blocks many safe prompts (low precision). This frustrates users.

If the defense is too loose, it lets many injections through (low recall) but rarely blocks safe prompts (high precision). This risks AI misuse.

Example:

  • High recall, low precision: 95% injections caught, but 30% safe prompts blocked.
  • High precision, low recall: 80% safe prompts allowed, but only 60% injections caught.

We want a balance that keeps the AI safe and users happy.

What good vs bad metric values look like

Good defense:

  • Recall above 90% (most injections caught)
  • Precision above 85% (few safe prompts blocked)
  • Balanced F1 score above 0.87

Bad defense:

  • Recall below 70% (many injections missed)
  • Precision below 60% (many false blocks)
  • Low F1 score below 0.65
Common pitfalls in metrics for prompt injection defense
  • Accuracy paradox: If injections are rare, a model that always says "safe" can have high accuracy but zero recall.
  • Data leakage: Testing on prompts seen during training inflates metrics falsely.
  • Overfitting: Defense works well on test data but fails on new injection types.
  • Ignoring recall: Missing injections can cause serious harm, so recall must not be overlooked.
Self-check question

Your prompt injection defense model has 98% accuracy but only 12% recall on injection prompts. Is it good for production? Why or why not?

Answer: No, it is not good. The low recall means it misses 88% of harmful injections, which can let attacks through. High accuracy is misleading here because injections are rare. The defense must catch most injections to be effective.

Key Result
High recall and precision are essential to effectively detect prompt injections while minimizing false blocks.

Practice

(1/5)
1. What is the main purpose of prompt injection defense in AI systems?
easy
A. To protect AI from harmful or tricky user inputs
B. To improve AI's speed in processing data
C. To increase the size of the AI model
D. To reduce the cost of running AI models

Solution

  1. Step 1: Understand the role of prompt injection defense

    Prompt injection defense is designed to stop harmful or tricky inputs from confusing or misguiding the AI.
  2. Step 2: Compare options with this purpose

    Only To protect AI from harmful or tricky user inputs matches this goal; others relate to speed, size, or cost, which are unrelated.
  3. Final Answer:

    To protect AI from harmful or tricky user inputs -> Option A
  4. Quick Check:

    Purpose of prompt injection defense = Protect AI inputs [OK]
Hint: Focus on defense meaning protection from bad inputs [OK]
Common Mistakes:
  • Confusing defense with performance improvement
  • Thinking it changes AI model size
  • Assuming it reduces costs
2. Which of the following is a correct way to implement a simple prompt injection defense filter in Python?
easy
A. if user_input = 'DROP TABLE': block_request()
B. if 'DROP TABLE' in user_input.upper(): block_request()
C. if user_input.contains('DROP TABLE'): block_request()
D. if user_input == 'drop table': block_request()

Solution

  1. Step 1: Check syntax for string containment in Python

    Python uses in to check if a substring exists in a string, and upper() helps catch case differences.
  2. Step 2: Evaluate each option's correctness

    if 'DROP TABLE' in user_input.upper(): block_request() uses correct syntax and case normalization. if user_input = 'DROP TABLE': block_request() uses assignment instead of comparison. if user_input.contains('DROP TABLE'): block_request() uses a non-existent method contains. if user_input == 'drop table': block_request() checks exact lowercase match, missing case variations.
  3. Final Answer:

    if 'DROP TABLE' in user_input.upper(): block_request() -> Option B
  4. Quick Check:

    Use 'in' and upper() for case-insensitive check [OK]
Hint: Remember Python uses 'in' for substring checks [OK]
Common Mistakes:
  • Using '=' instead of '==' for comparison
  • Using non-existent string methods
  • Ignoring case sensitivity in checks
3. Given the code below, what will be the output if user_input = "Please DROP TABLE users"?
def block_request():
    return "Blocked"

def process_input(user_input):
    if 'DROP TABLE' in user_input.upper():
        return block_request()
    return "Allowed"

print(process_input(user_input))
medium
A. SyntaxError
B. Allowed
C. Blocked
D. None

Solution

  1. Step 1: Analyze the condition in process_input

    The input string uppercased is "PLEASE DROP TABLE USERS" which contains "DROP TABLE".
  2. Step 2: Determine which branch runs

    Since the condition is true, block_request() is called, returning "Blocked".
  3. Final Answer:

    Blocked -> Option C
  4. Quick Check:

    Input contains 'DROP TABLE' -> Blocked [OK]
Hint: Check if uppercase input contains 'DROP TABLE' [OK]
Common Mistakes:
  • Ignoring case and expecting 'Allowed'
  • Thinking code has syntax errors
  • Assuming function returns None by default
4. Identify the error in this prompt injection defense code snippet:
def check_input(text):
    if text.lower().find('delete'):
        return 'Blocked'
    return 'Allowed'
medium
A. The find method returns -1 if not found, so condition is wrong
B. Using lower() is incorrect for filtering
C. The function should return a boolean, not strings
D. The function is missing a parameter

Solution

  1. Step 1: Understand find method behavior

    find returns the index of substring or -1 if not found. In Python, -1 is truthy, so condition fails.
  2. Step 2: Explain why this causes wrong logic

    If 'delete' is not found, condition is true (wrong). It should check if result is not -1 explicitly.
  3. Final Answer:

    The find method returns -1 if not found, so condition is wrong -> Option A
  4. Quick Check:

    Check find() != -1 for correct condition [OK]
Hint: Remember find() returns -1 if substring missing [OK]
Common Mistakes:
  • Assuming find() returns False when not found
  • Ignoring that -1 is truthy in Python
  • Thinking lower() is the error
5. You want to defend an AI prompt from injection attacks by blocking inputs containing any of these words: ['DROP', 'DELETE', 'SHUTDOWN']. Which code snippet correctly implements this defense?
hard
A. if user_input.upper() == 'DROP' or 'DELETE' or 'SHUTDOWN': block_request()
B. if all(word in user_input.upper() for word in ['DROP', 'DELETE', 'SHUTDOWN']): block_request()
C. if 'DROP' or 'DELETE' or 'SHUTDOWN' in user_input.upper(): block_request()
D. if any(word in user_input.upper() for word in ['DROP', 'DELETE', 'SHUTDOWN']): block_request()

Solution

  1. Step 1: Understand the goal to block if any word is present

    We want to block if at least one of the words appears in the input.
  2. Step 2: Evaluate each option's logic

    if any(word in user_input.upper() for word in ['DROP', 'DELETE', 'SHUTDOWN']): block_request() uses any() correctly to check presence of any word. if all(word in user_input.upper() for word in ['DROP', 'DELETE', 'SHUTDOWN']): block_request() requires all words, which is too strict. if 'DROP' or 'DELETE' or 'SHUTDOWN' in user_input.upper(): block_request() has incorrect syntax; it always evaluates to true due to or chaining. if user_input.upper() == 'DROP' or 'DELETE' or 'SHUTDOWN': block_request() compares whole input to each word incorrectly.
  3. Final Answer:

    if any(word in user_input.upper() for word in ['DROP', 'DELETE', 'SHUTDOWN']): block_request() -> Option D
  4. Quick Check:

    Use any() to check multiple keywords [OK]
Hint: Use any() to check if any keyword is in input [OK]
Common Mistakes:
  • Using all() instead of any()
  • Incorrect or chaining causing always true
  • Comparing whole string instead of substring