0
0
Prompt Engineering / GenAIml~5 mins

Human evaluation frameworks in Prompt Engineering / GenAI - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is the main purpose of human evaluation frameworks in AI?
Human evaluation frameworks help measure how well AI systems perform by using human judgment to assess qualities like accuracy, relevance, and user satisfaction.
Click to reveal answer
beginner
Name two common criteria used in human evaluation frameworks for AI outputs.
Common criteria include fluency (how natural the output sounds) and relevance (how well the output matches the input or task).
Click to reveal answer
intermediate
Why is inter-rater reliability important in human evaluation?
Inter-rater reliability ensures that different human evaluators give consistent scores, making the evaluation results trustworthy and less biased.
Click to reveal answer
beginner
Describe a simple human evaluation method for text generation models.
A simple method is to ask multiple people to rate generated sentences on a scale (e.g., 1 to 5) for qualities like clarity and correctness, then average the scores.
Click to reveal answer
intermediate
What is a limitation of human evaluation frameworks?
They can be time-consuming, costly, and sometimes subjective, which means results might vary depending on who evaluates and when.
Click to reveal answer
What does inter-rater reliability measure in human evaluation?
ASpeed of AI model predictions
BConsistency between different human evaluators
CNumber of evaluation criteria used
DAccuracy of automated metrics
Which of the following is NOT a typical criterion in human evaluation of AI outputs?
ACoherence
BRelevance
CFluency
DModel training time
Why might human evaluation be preferred over automated metrics?
AHumans are faster than machines
BAutomated metrics are always inaccurate
CHumans can judge quality aspects that machines cannot easily measure
DHuman evaluation is cheaper
What is a common scale used in human evaluation ratings?
A1 to 5
B0 to 1000
CTrue or False
DA to Z
Which factor can reduce the reliability of human evaluation?
ADifferent interpretations by evaluators
BUsing multiple evaluators
CClear evaluation guidelines
DRandomizing evaluation order
Explain what human evaluation frameworks are and why they are important in AI.
Think about how humans check AI outputs for quality.
You got /3 concepts.
    Describe how inter-rater reliability affects the trustworthiness of human evaluation results.
    Consider what happens if evaluators disagree a lot.
    You got /3 concepts.