Recall & Review
beginner
What is the main purpose of human evaluation frameworks in AI?
Human evaluation frameworks help measure how well AI systems perform by using human judgment to assess qualities like accuracy, relevance, and user satisfaction.
Click to reveal answer
beginner
Name two common criteria used in human evaluation frameworks for AI outputs.
Common criteria include fluency (how natural the output sounds) and relevance (how well the output matches the input or task).
Click to reveal answer
intermediate
Why is inter-rater reliability important in human evaluation?
Inter-rater reliability ensures that different human evaluators give consistent scores, making the evaluation results trustworthy and less biased.
Click to reveal answer
beginner
Describe a simple human evaluation method for text generation models.
A simple method is to ask multiple people to rate generated sentences on a scale (e.g., 1 to 5) for qualities like clarity and correctness, then average the scores.
Click to reveal answer
intermediate
What is a limitation of human evaluation frameworks?
They can be time-consuming, costly, and sometimes subjective, which means results might vary depending on who evaluates and when.
Click to reveal answer
What does inter-rater reliability measure in human evaluation?
✗ Incorrect
Inter-rater reliability checks if different people give similar scores, ensuring consistent human evaluation.
Which of the following is NOT a typical criterion in human evaluation of AI outputs?
✗ Incorrect
Model training time is a technical metric, not a human evaluation criterion.
Why might human evaluation be preferred over automated metrics?
✗ Incorrect
Humans can understand nuances like meaning and style that automated metrics may miss.
What is a common scale used in human evaluation ratings?
✗ Incorrect
A 1 to 5 scale is simple and widely used for rating quality.
Which factor can reduce the reliability of human evaluation?
✗ Incorrect
If evaluators interpret criteria differently, scores become inconsistent.
Explain what human evaluation frameworks are and why they are important in AI.
Think about how humans check AI outputs for quality.
You got /3 concepts.
Describe how inter-rater reliability affects the trustworthiness of human evaluation results.
Consider what happens if evaluators disagree a lot.
You got /3 concepts.