Experiment - BLEU score evaluation
Problem:You have a machine translation model that translates English sentences to French. You want to evaluate how good the translations are compared to human translations using the BLEU score metric.
Current Metrics:BLEU score: 0.45 (45%) on the test set
Issue:The BLEU score is moderate but you want to improve the evaluation by correctly computing BLEU with smoothing and multiple references to get a more reliable score.