0
0
NLPml~5 mins

BLEU score evaluation in NLP - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What does BLEU stand for in NLP?
BLEU stands for Bilingual Evaluation Understudy. It is a metric to evaluate how close a machine-generated text is to a human reference text.
Click to reveal answer
beginner
How does BLEU score measure similarity between texts?
BLEU measures similarity by counting matching n-grams (small word groups) between the machine output and reference text, then combines these counts into a score between 0 and 1.
Click to reveal answer
intermediate
Why does BLEU use a brevity penalty?
BLEU uses a brevity penalty to avoid giving high scores to very short machine outputs that match parts of the reference but miss most content.
Click to reveal answer
beginner
What is an n-gram in the context of BLEU score?
An n-gram is a sequence of n words. For example, a 2-gram (bigram) is two words in a row. BLEU compares n-grams from machine output and reference to check similarity.
Click to reveal answer
beginner
What does a BLEU score of 1.0 mean?
A BLEU score of 1.0 means the machine-generated text exactly matches the reference text in terms of n-grams and length.
Click to reveal answer
What does BLEU score primarily compare between machine output and reference?
ASentence length only
BGrammar correctness
CWord frequency only
DMatching n-grams
Why is a brevity penalty used in BLEU score?
ATo ignore punctuation
BTo reward shorter sentences
CTo penalize very short outputs that miss content
DTo increase score for longer sentences
Which of these is NOT a component of BLEU score calculation?
AN-gram precision
BRecall of words
CBrevity penalty
DGeometric mean of precisions
If a machine translation has a BLEU score close to 0, what does it mean?
AIt has almost no matching n-grams with reference
BIt perfectly matches the reference
CIt is too short
DIt is grammatically correct
What is the typical range of BLEU scores?
A0 to 1
B-1 to 1
C1 to 10
D0 to 100
Explain how BLEU score evaluates machine translation quality.
Think about how small word groups are compared and how length affects the score.
You got /4 concepts.
    Describe why BLEU score might not fully capture translation quality.
    Consider what BLEU measures and what it might miss about good translation.
    You got /4 concepts.