0
0
NLPml~20 mins

BLEU score evaluation in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
BLEU Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Understanding BLEU score basics

What does the BLEU score primarily measure in machine translation?

AThe speed at which the translation model produces output
BThe grammatical correctness of the translated sentence
CThe similarity between a machine-generated translation and one or more reference translations based on overlapping n-grams
DThe semantic meaning similarity using word embeddings
Attempts:
2 left
💡 Hint

Think about what BLEU compares between the candidate and reference sentences.

Predict Output
intermediate
1:30remaining
BLEU score output for simple sentences

What is the BLEU score output of the following Python code?

NLP
from nltk.translate.bleu_score import sentence_bleu
reference = [['the', 'cat', 'is', 'on', 'the', 'mat']]
candidate = ['the', 'cat', 'is', 'on', 'the', 'mat']
score = sentence_bleu(reference, candidate)
print(round(score, 2))
A1.0
B0.75
C0.5
D0.0
Attempts:
2 left
💡 Hint

Consider what happens when the candidate exactly matches the reference.

Model Choice
advanced
2:00remaining
Choosing a model for BLEU score evaluation

You want to evaluate a machine translation model's output using BLEU score. Which model output is best suited for BLEU evaluation?

AProbability distributions over vocabulary for each word
BRaw text strings without tokenization
CVector embeddings of sentences
DA list of tokenized sentences generated by the model
Attempts:
2 left
💡 Hint

BLEU score compares n-grams, so think about the input format it requires.

Metrics
advanced
1:30remaining
Interpreting BLEU score values

Which statement about BLEU scores is correct?

AA BLEU score of 0.8 means the candidate translation is perfect
BBLEU scores are between 0 and 1, where higher is better
CBLEU scores closer to 0 indicate better translation quality
DBLEU scores measure recall of n-grams only
Attempts:
2 left
💡 Hint

Think about the range and meaning of BLEU scores.

🔧 Debug
expert
2:00remaining
Debugging BLEU score calculation code

What error does the following code raise when calculating BLEU score?

from nltk.translate.bleu_score import sentence_bleu
reference = ['the', 'cat', 'sat']
candidate = ['the', 'cat', 'sat']
score = sentence_bleu(reference, candidate)
print(score)
ATypeError: list of references should be a list of lists
BSyntaxError: invalid syntax
CValueError: candidate sentence empty
DNo error, prints 1.0
Attempts:
2 left
💡 Hint

Check the expected input format for references in sentence_bleu.