Model Pipeline - Prompt injection attacks
This pipeline shows how a language model processes input prompts and how prompt injection attacks can manipulate the output by injecting harmful instructions.
This pipeline shows how a language model processes input prompts and how prompt injection attacks can manipulate the output by injecting harmful instructions.
Loss
2.3 |**************
1.5 |********
0.9 |*****
0.5 |***
0.3 |**
----------------
Epochs 1 to 20
| Epoch | Loss ↓ | Accuracy ↑ | Observation |
|---|---|---|---|
| 1 | 2.3 | 0.1 | Initial training with random outputs, high loss and low accuracy. |
| 5 | 1.5 | 0.45 | Model starts learning basic language patterns. |
| 10 | 0.9 | 0.7 | Model improves understanding of instructions. |
| 15 | 0.5 | 0.85 | Model reliably follows prompts but vulnerable to injection. |
| 20 | 0.3 | 0.92 | Model achieves high accuracy but prompt injection risk remains. |