Recall & Review
beginner
What does BLEU score measure in text generation?
BLEU (Bilingual Evaluation Understudy) measures how closely the generated text matches one or more reference texts by comparing overlapping n-grams.
Click to reveal answer
beginner
What is ROUGE used for in evaluating generated text?
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) measures the overlap of units such as n-grams, word sequences, and word pairs between the generated text and reference summaries, focusing on recall.
Click to reveal answer
intermediate
Why is BLEU considered precision-oriented while ROUGE is recall-oriented?
BLEU focuses on how much of the generated text matches the reference (precision), while ROUGE focuses on how much of the reference text is covered by the generated text (recall).
Click to reveal answer
beginner
What is an n-gram in the context of BLEU and ROUGE?
An n-gram is a sequence of 'n' words in a row. For example, a 2-gram (bigram) is two consecutive words. Both BLEU and ROUGE compare these sequences between generated and reference texts.
Click to reveal answer
intermediate
How does BLEU handle multiple reference texts?
BLEU compares the generated text against multiple reference texts and uses the best matching n-grams from any reference to calculate the score, improving evaluation accuracy.
Click to reveal answer
What does a high BLEU score indicate?
✗ Incorrect
A high BLEU score means the generated text shares many n-grams with the reference, indicating close similarity.
Which metric is more focused on recall in text evaluation?
✗ Incorrect
ROUGE emphasizes recall by measuring how much of the reference text is captured by the generated text.
What is an n-gram?
✗ Incorrect
An n-gram is a continuous sequence of n words used to compare texts.
Which of these is true about BLEU?
✗ Incorrect
BLEU measures precision by checking how much generated text matches reference n-grams.
ROUGE is commonly used to evaluate which type of generated text?
✗ Incorrect
ROUGE is widely used to evaluate summaries by comparing them to reference summaries.
Explain how BLEU and ROUGE differ in evaluating generated text.
Think about what each metric focuses on: matching generated text vs. covering reference text.
You got /4 concepts.
Describe what an n-gram is and why it is important for BLEU and ROUGE.
Consider how small word groups help check if texts are similar.
You got /3 concepts.