Model Pipeline - Evaluating generated text (BLEU, ROUGE)
This pipeline shows how we check the quality of text generated by a computer. We compare the generated text to a correct example using scores called BLEU and ROUGE.
This pipeline shows how we check the quality of text generated by a computer. We compare the generated text to a correct example using scores called BLEU and ROUGE.
Loss
1.0 |***************
0.8 |************
0.6 |********
0.4 |*****
0.2 |**
0.0 +----------------
1 2 3 4 5 Epochs
| Epoch | Loss ↓ | Accuracy ↑ | Observation |
|---|---|---|---|
| 1 | 0.85 | 0.40 | Initial evaluation shows low BLEU and ROUGE scores indicating poor text quality. |
| 2 | 0.65 | 0.55 | Scores improve as the model learns to generate more similar text. |
| 3 | 0.50 | 0.68 | Better matching of n-grams and sequences reflected in higher BLEU and ROUGE. |
| 4 | 0.40 | 0.75 | Model generates more fluent and relevant text, scores continue to rise. |
| 5 | 0.35 | 0.80 | Training converges with good BLEU and ROUGE scores showing quality text generation. |