Model Pipeline - Vision Transformer (ViT)
The Vision Transformer (ViT) model splits an image into small patches, turns them into a sequence, and uses a transformer to learn patterns for image classification.
The Vision Transformer (ViT) model splits an image into small patches, turns them into a sequence, and uses a transformer to learn patterns for image classification.
Loss
2.3 |*
1.5 | *
0.9 | *
0.6 | *
0.45| *
+----------
1 5 10 15 20 Epochs
| Epoch | Loss ↓ | Accuracy ↑ | Observation |
|---|---|---|---|
| 1 | 2.30 | 0.12 | Starting training, loss high, accuracy low |
| 5 | 1.50 | 0.45 | Model learning basic features, accuracy improving |
| 10 | 0.90 | 0.70 | Good progress, model captures complex patterns |
| 15 | 0.60 | 0.82 | Loss decreasing steadily, accuracy high |
| 20 | 0.45 | 0.88 | Training converging, model performs well |