Introduction
When machines generate text, images, or decisions, we need a way to check if the results are good. Human evaluation frameworks help us understand how well these systems perform by using people to judge their outputs.
Imagine a new recipe being tested. While a machine can check if the ingredients are correct, only a person can taste the dish and say if it is delicious, balanced, or needs more salt. Human evaluation frameworks are like food critics who judge the final dish to help the chef improve.
┌───────────────────────────────┐ │ Human Evaluation │ ├─────────────┬───────────────┤ │ Criteria │ Methods │ │ (Quality) │ (Rating, Rank)│ ├─────────────┴───────────────┤ │ Challenges & Solutions │ │ (Cost, Consistency, Multiple │ │ Evaluators) │ ├───────────────────────────────┤ │ Feedback to AI Models │ └───────────────────────────────┘