BLEU score helps us check how good a computer's translation is by comparing it to human translations. It tells us if the computer is doing a good job.
0
0
BLEU score evaluation in NLP
Introduction
When you want to see how well a machine translated a sentence compared to a human translation.
When testing different translation models to pick the best one.
When improving a chatbot's language by checking its responses against correct answers.
When comparing summaries or paraphrases generated by a computer to original texts.
Syntax
NLP
from nltk.translate.bleu_score import sentence_bleu reference = [['this', 'is', 'a', 'test']] candidate = ['this', 'is', 'a', 'test'] score = sentence_bleu(reference, candidate) print(score)
The reference is a list of correct translations (each is a list of words).
The candidate is the machine's translation (a list of words).
Examples
Compare a candidate sentence with one reference sentence.
NLP
reference = [['the', 'cat', 'is', 'on', 'the', 'mat']] candidate = ['the', 'cat', 'sat', 'on', 'the', 'mat'] score = sentence_bleu(reference, candidate) print(score)
Compare candidate with multiple reference sentences.
NLP
references = [['this', 'is', 'a', 'test'], ['this', 'is', 'test']] candidate = ['this', 'is', 'a', 'test'] score = sentence_bleu(references, candidate) print(score)
Sample Model
This program calculates the BLEU score between one human reference and one machine candidate sentence. It shows how close the machine's sentence is to the human's.
NLP
from nltk.translate.bleu_score import sentence_bleu # One reference translation reference = [['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']] # Candidate translation from machine candidate = ['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog'] # Calculate BLEU score score = sentence_bleu(reference, candidate) print(f"BLEU score: {score:.4f}")
OutputSuccess
Important Notes
BLEU score ranges from 0 to 1, where 1 means perfect match.
BLEU uses matching of small word groups (called n-grams) to compare sentences.
Shorter candidate sentences may get lower scores even if correct.
Summary
BLEU score measures how close a machine translation is to human translations.
It compares words and word groups between candidate and reference sentences.
Higher BLEU means better translation quality.