Recall & Review
beginner
What does BLEU stand for in NLP?
BLEU stands for Bilingual Evaluation Understudy. It is a metric to evaluate how close a machine-generated text is to a human reference text.
Click to reveal answer
beginner
How does BLEU score measure similarity between texts?
BLEU measures similarity by counting matching n-grams (small word groups) between the machine output and reference text, then combines these counts into a score between 0 and 1.
Click to reveal answer
intermediate
Why does BLEU use a brevity penalty?
BLEU uses a brevity penalty to avoid giving high scores to very short machine outputs that match parts of the reference but miss most content.
Click to reveal answer
beginner
What is an n-gram in the context of BLEU score?
An n-gram is a sequence of n words. For example, a 2-gram (bigram) is two words in a row. BLEU compares n-grams from machine output and reference to check similarity.
Click to reveal answer
beginner
What does a BLEU score of 1.0 mean?
A BLEU score of 1.0 means the machine-generated text exactly matches the reference text in terms of n-grams and length.
Click to reveal answer
What does BLEU score primarily compare between machine output and reference?
✗ Incorrect
BLEU score compares matching n-grams (word sequences) between machine output and reference to measure similarity.
Why is a brevity penalty used in BLEU score?
✗ Incorrect
The brevity penalty prevents giving high scores to very short outputs that match only part of the reference.
Which of these is NOT a component of BLEU score calculation?
✗ Incorrect
BLEU uses precision of n-grams, not recall, combined with brevity penalty and geometric mean.
If a machine translation has a BLEU score close to 0, what does it mean?
✗ Incorrect
A BLEU score near 0 means very few or no matching n-grams with the reference text.
What is the typical range of BLEU scores?
✗ Incorrect
BLEU scores range from 0 (no match) to 1 (perfect match).
Explain how BLEU score evaluates machine translation quality.
Think about how small word groups are compared and how length affects the score.
You got /4 concepts.
Describe why BLEU score might not fully capture translation quality.
Consider what BLEU measures and what it might miss about good translation.
You got /4 concepts.