0
0
NLPml~10 mins

ROUGE evaluation metrics in NLP - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to import the ROUGE metric from the datasets library.

NLP
from datasets import [1]
Drag options to blanks, or click blank then click option'
Aload_dataset
Bload_metric
Cload_rouge
Dload_eval
Attempts:
3 left
💡 Hint
Common Mistakes
Using load_dataset instead of load_metric
Trying to import a non-existing function like load_rouge
2fill in blank
medium

Complete the code to compute ROUGE scores given predictions and references.

NLP
rouge = load_metric('rouge')
results = rouge.[1](predictions=preds, references=refs)
Drag options to blanks, or click blank then click option'
Ascore
Bevaluate
Ccompute
Dcalculate
Attempts:
3 left
💡 Hint
Common Mistakes
Using evaluate or score instead of compute
Passing wrong argument names
3fill in blank
hard

Fix the error in the code to correctly prepare the predictions for ROUGE evaluation by removing extra spaces.

NLP
clean_preds = [pred.strip() for pred in [1]]
Drag options to blanks, or click blank then click option'
Apredictions
Brouge
Cresults
Drefs
Attempts:
3 left
💡 Hint
Common Mistakes
Cleaning references instead of predictions
Trying to clean the results or rouge object
4fill in blank
hard

Fill both blanks to compute ROUGE-L F1 score from the results dictionary.

NLP
rouge_l_f1 = results['rougeL'].[1].[2]
Drag options to blanks, or click blank then click option'
Amid
Bfmeasure
Cprecision
Drecall
Attempts:
3 left
💡 Hint
Common Mistakes
Using precision or recall instead of fmeasure
Using mean instead of mid
5fill in blank
hard

Fill all three blanks to create a dictionary of ROUGE-1, ROUGE-2, and ROUGE-L F1 scores.

NLP
scores = {
    'rouge1': results['rouge1'].[1].[2],
    'rouge2': results['rouge2'].[1].[3],
    'rougeL': results['rougeL'].[1].[2]
}
Drag options to blanks, or click blank then click option'
Amid
Bfmeasure
Crecall
Dprecision
Attempts:
3 left
💡 Hint
Common Mistakes
Mixing precision and recall incorrectly
Using mean instead of mid