For translation tasks, the main goal is to produce text in the target language that matches the meaning and style of the original. The most common metric is BLEU (Bilingual Evaluation Understudy). BLEU measures how many words or phrases in the translated text match the reference translation. It helps us know if the model is producing accurate and fluent translations.
BLEU is important because it compares the overlap of short word sequences (called n-grams) between the model output and human translations. A higher BLEU score means the translation is closer to what a human would write.
Other metrics like METEOR and ROUGE also exist, but BLEU is widely used for quick checks.