Complete the code to import the ROUGE metric from the datasets library.
from datasets import [1]
The load_metric function is used to load evaluation metrics like ROUGE.
Complete the code to compute ROUGE scores given predictions and references.
rouge = load_metric('rouge') results = rouge.[1](predictions=preds, references=refs)
The compute method calculates the ROUGE scores from predictions and references.
Fix the error in the code to correctly prepare the predictions for ROUGE evaluation by removing extra spaces.
clean_preds = [pred.strip() for pred in [1]]
We clean the predictions by stripping spaces before evaluation.
Fill both blanks to compute ROUGE-L F1 score from the results dictionary.
rouge_l_f1 = results['rougeL'].[1].[2]
The mid key holds median scores, and fmeasure is the F1 score for ROUGE-L.
Fill all three blanks to create a dictionary of ROUGE-1, ROUGE-2, and ROUGE-L F1 scores.
scores = {
'rouge1': results['rouge1'].[1].[2],
'rouge2': results['rouge2'].[1].[3],
'rougeL': results['rougeL'].[1].[2]
}We use mid for median, fmeasure for F1 score, and fmeasure for ROUGE-2 F1 score as well.