Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What does BLEU stand for in NLP?
BLEU stands for Bilingual Evaluation Understudy. It is a metric to evaluate how close a machine-generated text is to a human reference text.
Click to reveal answer
beginner
How does BLEU score measure similarity between texts?
BLEU measures similarity by counting matching n-grams (small word groups) between the machine output and reference text, then combines these counts into a score between 0 and 1.
Click to reveal answer
intermediate
Why does BLEU use a brevity penalty?
BLEU uses a brevity penalty to avoid giving high scores to very short machine outputs that match parts of the reference but miss most content.
Click to reveal answer
beginner
What is an n-gram in the context of BLEU score?
An n-gram is a sequence of n words. For example, a 2-gram (bigram) is two words in a row. BLEU compares n-grams from machine output and reference to check similarity.
Click to reveal answer
beginner
What does a BLEU score of 1.0 mean?
A BLEU score of 1.0 means the machine-generated text exactly matches the reference text in terms of n-grams and length.
Click to reveal answer
What does BLEU score primarily compare between machine output and reference?
ASentence length only
BGrammar correctness
CWord frequency only
DMatching n-grams
✗ Incorrect
BLEU score compares matching n-grams (word sequences) between machine output and reference to measure similarity.
Why is a brevity penalty used in BLEU score?
ATo ignore punctuation
BTo reward shorter sentences
CTo penalize very short outputs that miss content
DTo increase score for longer sentences
✗ Incorrect
The brevity penalty prevents giving high scores to very short outputs that match only part of the reference.
Which of these is NOT a component of BLEU score calculation?
AN-gram precision
BRecall of words
CBrevity penalty
DGeometric mean of precisions
✗ Incorrect
BLEU uses precision of n-grams, not recall, combined with brevity penalty and geometric mean.
If a machine translation has a BLEU score close to 0, what does it mean?
AIt has almost no matching n-grams with reference
BIt perfectly matches the reference
CIt is too short
DIt is grammatically correct
✗ Incorrect
A BLEU score near 0 means very few or no matching n-grams with the reference text.
What is the typical range of BLEU scores?
A0 to 1
B-1 to 1
C1 to 10
D0 to 100
✗ Incorrect
BLEU scores range from 0 (no match) to 1 (perfect match).
Explain how BLEU score evaluates machine translation quality.
Think about how small word groups are compared and how length affects the score.
You got /4 concepts.
Describe why BLEU score might not fully capture translation quality.
Consider what BLEU measures and what it might miss about good translation.
You got /4 concepts.
Practice
(1/5)
1. What does the BLEU score primarily measure in machine translation?
easy
A. How close the machine translation is to human translations
B. The speed of the translation process
C. The number of words in the translated sentence
D. The grammar correctness of the translation
Solution
Step 1: Understand BLEU score purpose
BLEU score is designed to compare machine translations to human reference translations.
Step 2: Identify what BLEU measures
It measures similarity in words and phrases, not speed or grammar correctness.
Final Answer:
How close the machine translation is to human translations -> Option A
Quick Check:
BLEU = similarity to human translations [OK]
Hint: BLEU = closeness to human translation quality [OK]
Common Mistakes:
Confusing BLEU with translation speed
Thinking BLEU measures grammar correctness
Assuming BLEU counts total words only
2. Which of the following is the correct way to calculate the BLEU score using NLTK in Python?
easy
A. bleu_score = nltk.bleu_score(candidate, [reference])
B. bleu_score = nltk.translate.bleu_score.sentence_bleu([reference], candidate)
C. bleu_score = nltk.translate.bleu_score(candidate, reference)
D. bleu_score = nltk.translate.bleu_score.score(candidate, reference)
Solution
Step 1: Recall NLTK BLEU function syntax
The correct function is sentence_bleu and it takes a list of references and a candidate sentence.
Step 2: Match correct argument order
References must be a list of lists, candidate is a list of tokens.
Final Answer:
bleu_score = nltk.translate.bleu_score.sentence_bleu([reference], candidate) -> Option B
Quick Check:
Use sentence_bleu([ref], cand) syntax [OK]
Hint: Use sentence_bleu with references as list of lists [OK]
Common Mistakes:
Passing candidate before reference
Not wrapping reference in a list
Using incorrect function names
3. Given the candidate sentence ["the", "cat", "is", "on", "the", "mat"] and reference sentence ["there", "is", "a", "cat", "on", "the", "mat"], what is the approximate BLEU score (unigram precision only)?
medium
A. 0.83
B. 0.50
C. 0.67
D. 0.33
Solution
Step 1: Calculate unigram matches
Candidate words: the, cat, is, on, the, mat Reference words: there, is, a, cat, on, the, mat Matching unigrams: the, cat, is, on, mat (count matches carefully)
Step 2: Compute unigram precision
Matches = 5 (the counted once), Candidate length = 6 Precision = 5/6 ≈ 0.83 but 'the' appears twice in candidate but once in reference, so max count for 'the' is 1. Counting max matches: 'the' once, 'cat' once, 'is' once, 'on' once, 'mat' once = 5 matches Precision = 5/6 ≈ 0.83
Step 3: Adjust for max counts
Since 'the' appears twice in candidate but only once in reference, only one 'the' counts. So total matches = 5, candidate length = 6, precision = 5/6 ≈ 0.83
Final Answer:
0.83 -> Option A
Quick Check:
Unigram precision = 5/6 = 0.83 [OK]
Hint: Count max reference word matches for unigram precision [OK]
Common Mistakes:
Counting repeated words more than reference max
Confusing unigram with bigram precision
Ignoring max count clipping
4. Identify the error in this BLEU score calculation code snippet:
C. Reference should be a list of lists, not a single list
D. sentence_bleu requires lowercase strings only
Solution
Step 1: Check sentence_bleu input format
sentence_bleu expects references as a list of reference sentences (each a list of tokens), so reference must be wrapped in another list.
Step 2: Identify the error in code
Reference is given as a single list, not a list of lists, causing a type error or wrong calculation.
Final Answer:
Reference should be a list of lists, not a single list -> Option C
Quick Check:
References = list of lists [OK]
Hint: Wrap reference in a list for sentence_bleu [OK]
Common Mistakes:
Passing reference as a flat list
Passing candidate as string instead of list
Ignoring input format requirements
5. You have two reference translations: ref1 = ['the', 'cat', 'is', 'on', 'the', 'mat'] ref2 = ['there', 'is', 'a', 'cat', 'on', 'the', 'mat'] And a candidate translation: candidate = ['the', 'cat', 'sat', 'on', 'the', 'mat'] How should you prepare the references to correctly compute the BLEU score considering multiple references?
hard
A. Pass references as separate calls to sentence_bleu
B. Concatenate ref1 and ref2 into a single list and pass as one reference
C. Pass only the reference closest in length to candidate
D. Pass references as a list containing both ref1 and ref2 lists
Solution
Step 1: Understand multiple references in BLEU
BLEU supports multiple references by passing a list of reference sentences (each a list of tokens).
Step 2: Prepare references correctly
References should be passed as [ref1, ref2], a list containing both reference lists.
Step 3: Avoid incorrect methods
Concatenating references or passing separately will give wrong results.
Final Answer:
Pass references as a list containing both ref1 and ref2 lists -> Option D
Quick Check:
Multiple references = list of reference lists [OK]
Hint: Use list of reference lists for multiple references [OK]