Recall & Review
beginner
What does ROUGE stand for in NLP evaluation?
ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It is a set of metrics used to evaluate automatic summarization and machine translation by comparing system-generated text to reference texts.
Click to reveal answer
beginner
What is the main purpose of ROUGE metrics?
ROUGE metrics measure how much overlap there is between the words or phrases in a machine-generated summary and a human-written reference summary. It helps check the quality of summaries by focusing on recall, precision, and F1 score.
Click to reveal answer
intermediate
Explain ROUGE-N metric.
ROUGE-N measures the overlap of n-grams (continuous sequences of n words) between the candidate summary and the reference summary. For example, ROUGE-1 looks at single words, ROUGE-2 looks at pairs of words.
Click to reveal answer
intermediate
What is ROUGE-L and why is it useful?
ROUGE-L measures the longest common subsequence (LCS) between the candidate and reference summaries. It captures sentence-level structure similarity and is useful because it does not require consecutive matches but keeps word order.
Click to reveal answer
beginner
How are precision, recall, and F1 score used in ROUGE metrics?
Precision measures how many words in the candidate summary appear in the reference. Recall measures how many words in the reference appear in the candidate. F1 score is the balance between precision and recall, giving a single score to evaluate quality.
Click to reveal answer
What does ROUGE primarily measure in text summaries?
✗ Incorrect
ROUGE measures the overlap of words or phrases to evaluate how well the candidate summary matches the reference.
Which ROUGE metric uses longest common subsequence (LCS)?
✗ Incorrect
ROUGE-L uses the longest common subsequence to capture sentence-level similarity.
ROUGE-2 evaluates overlap of which type of n-grams?
✗ Incorrect
ROUGE-2 measures overlap of 2-word sequences, or pairs of words.
In ROUGE metrics, what does recall measure?
✗ Incorrect
Recall measures how many words from the reference summary appear in the candidate summary.
Why is F1 score important in ROUGE evaluation?
✗ Incorrect
F1 score balances precision and recall, providing a single metric to evaluate summary quality.
Describe what ROUGE evaluation metrics are and why they are used in NLP.
Think about how we check if a summary made by a computer matches a human summary.
You got /4 concepts.
Explain the difference between ROUGE-N and ROUGE-L metrics.
Consider how sequences of words are matched differently.
You got /4 concepts.