beginner

What is the main purpose of human evaluation frameworks in AI?

Human evaluation frameworks help measure how well AI systems perform by using human judgment to assess qualities like accuracy, relevance, and user satisfaction.

Click to reveal answer

beginner

Name two common criteria used in human evaluation frameworks for AI outputs.

Common criteria include fluency (how natural the output sounds) and relevance (how well the output matches the input or task).

Click to reveal answer

intermediate

Why is inter-rater reliability important in human evaluation?

Inter-rater reliability ensures that different human evaluators give consistent scores, making the evaluation results trustworthy and less biased.

Click to reveal answer

beginner

Describe a simple human evaluation method for text generation models.

A simple method is to ask multiple people to rate generated sentences on a scale (e.g., 1 to 5) for qualities like clarity and correctness, then average the scores.

Click to reveal answer

intermediate

What is a limitation of human evaluation frameworks?

They can be time-consuming, costly, and sometimes subjective, which means results might vary depending on who evaluates and when.

Click to reveal answer

What does inter-rater reliability measure in human evaluation?

ASpeed of AI model predictions

BConsistency between different human evaluators

CNumber of evaluation criteria used

DAccuracy of automated metrics

Which of the following is NOT a typical criterion in human evaluation of AI outputs?

ACoherence

BRelevance

CFluency

DModel training time

Why might human evaluation be preferred over automated metrics?

AHumans are faster than machines

BAutomated metrics are always inaccurate

CHumans can judge quality aspects that machines cannot easily measure

DHuman evaluation is cheaper

What is a common scale used in human evaluation ratings?

A1 to 5

B0 to 1000

CTrue or False

DA to Z

Which factor can reduce the reliability of human evaluation?

ADifferent interpretations by evaluators

BUsing multiple evaluators

CClear evaluation guidelines

DRandomizing evaluation order

Explain what human evaluation frameworks are and why they are important in AI.

Describe how inter-rater reliability affects the trustworthiness of human evaluation results.