Model Pipeline - Multimodal RAG
Multimodal RAG combines text and images to answer questions by retrieving relevant information and generating answers using both types of data.
Multimodal RAG combines text and images to answer questions by retrieving relevant information and generating answers using both types of data.
Epoch 1: ************ (1.2) Epoch 2: ********* (0.9) Epoch 3: ******* (0.7) Epoch 4: ***** (0.55) Epoch 5: **** (0.45)
| Epoch | Loss ↓ | Accuracy ↑ | Observation |
|---|---|---|---|
| 1 | 1.2 | 0.45 | Model starts learning, loss high, accuracy low |
| 2 | 0.9 | 0.60 | Loss decreases, accuracy improves |
| 3 | 0.7 | 0.72 | Model learns better multimodal relations |
| 4 | 0.55 | 0.80 | Loss continues to drop, accuracy rises |
| 5 | 0.45 | 0.85 | Good convergence, model ready for predictions |