Introduction
When you improve a machine learning model by fine-tuning it, you need to check if it actually got better. Evaluation helps you see how well the fine-tuned model performs on tasks it will face in real life.
Imagine you practice a speech to improve it. After practicing, you ask friends to listen and give feedback on how clear and engaging it is. Their feedback helps you know if your practice worked or if you need more changes.
┌─────────────────────────────┐
│ Fine-tuned Model │
└─────────────┬───────────────┘
│
┌───────▼────────┐
│ Test Data │
└───────┬────────┘
│
┌───────▼────────┐
│ Evaluation │
│ Metrics & │
│ Human Review │
└───────┬────────┘
│
┌───────▼────────┐
│ Performance │
│ Results │
└────────────────┘