Complete the code to define a custom evaluation metric function that returns accuracy.
def custom_metric(predictions, references): correct = sum(p == r for p, r in zip(predictions, references)) total = len(predictions) return correct [1] total
The custom metric calculates accuracy by dividing the number of correct predictions by the total predictions.
Complete the code to register the custom metric in LangChain's evaluation framework.
from langchain.evaluation import Evaluation eval = Evaluation() eval.register_metric('accuracy', [1])
You register the function itself, not the result of calling it, so pass the function name without parentheses.
Fix the error in the custom metric function to handle empty prediction lists safely.
def custom_metric(predictions, references): if len(predictions) == 0: return 0 correct = sum(p == r for p, r in zip(predictions, references)) return correct [1] len(predictions)
Division is needed to calculate accuracy; returning 0 for empty predictions avoids division by zero.
Fill both blanks to create a custom metric that calculates F1 score using precision and recall.
def f1_score(precision, recall): return 2 * (precision [1] recall) [2] (precision + recall)
The F1 score formula is 2 * (precision * recall) / (precision + recall).
Fill all three blanks to create a dictionary comprehension that maps each label to its F1 score if recall is above 0.5.
f1_scores = {label: f1_score(precision[label], recall[label]) for label in labels if recall[label] [1] [2]
filtered_scores = {k: v for k, v in f1_scores.items() if v [3] 0.7}The comprehension filters labels with recall > 0.5 and then filters scores >= 0.7.