Model Pipeline - Prompt injection defense
This pipeline shows how a language model defends against prompt injection attacks by detecting and filtering harmful inputs before generating safe responses.
Jump into concepts and practice - no test required
This pipeline shows how a language model defends against prompt injection attacks by detecting and filtering harmful inputs before generating safe responses.
Loss
0.9 |****
0.7 |****
0.5 |****
0.3 |****
+---------
1 2 3 4 5 Epochs
| Epoch | Loss ↓ | Accuracy ↑ | Observation |
|---|---|---|---|
| 1 | 0.85 | 0.60 | Model starts learning to detect injection patterns |
| 2 | 0.65 | 0.75 | Detection accuracy improves, fewer false negatives |
| 3 | 0.50 | 0.85 | Model reliably flags suspicious prompts |
| 4 | 0.40 | 0.90 | Sanitization module learns to clean prompts effectively |
| 5 | 0.35 | 0.93 | Overall defense pipeline converges with high accuracy |
prompt injection defense in AI systems?in to check if a substring exists in a string, and upper() helps catch case differences.contains. if user_input == 'drop table': block_request() checks exact lowercase match, missing case variations.user_input = "Please DROP TABLE users"?
def block_request():
return "Blocked"
def process_input(user_input):
if 'DROP TABLE' in user_input.upper():
return block_request()
return "Allowed"
print(process_input(user_input))process_inputblock_request() is called, returning "Blocked".def check_input(text):
if text.lower().find('delete'):
return 'Blocked'
return 'Allowed'find method behaviorfind returns the index of substring or -1 if not found. In Python, -1 is truthy, so condition fails.find method returns -1 if not found, so condition is wrong -> Option A['DROP', 'DELETE', 'SHUTDOWN']. Which code snippet correctly implements this defense?any() correctly to check presence of any word. if all(word in user_input.upper() for word in ['DROP', 'DELETE', 'SHUTDOWN']): block_request() requires all words, which is too strict. if 'DROP' or 'DELETE' or 'SHUTDOWN' in user_input.upper(): block_request() has incorrect syntax; it always evaluates to true due to or chaining. if user_input.upper() == 'DROP' or 'DELETE' or 'SHUTDOWN': block_request() compares whole input to each word incorrectly.