Consider this Python function used as a custom evaluation metric in Langchain:
def custom_metric(predictions, references):
correct = sum(p == r for p, r in zip(predictions, references))
total = len(references)
return correct / total if total > 0 else 0What is the output of custom_metric(['a', 'b', 'c'], ['a', 'x', 'c'])?
def custom_metric(predictions, references): correct = sum(p == r for p, r in zip(predictions, references)) total = len(references) return correct / total if total > 0 else 0 result = custom_metric(['a', 'b', 'c'], ['a', 'x', 'c'])
Count how many predictions match the references exactly, then divide by total items.
The function counts matches: 'a'=='a' (true), 'b'=='x' (false), 'c'=='c' (true). So 2 matches out of 3 total, giving 2/3 = 0.666...
Which of the following Python functions correctly defines a custom evaluation metric that returns the ratio of matching items between predictions and references?
Use zip to pair predictions and references correctly.
Option D correctly pairs each prediction with its reference and counts matches, then divides by total references.
Option D incorrectly uses nested loops causing overcounting.
Option D tries to compare lists directly, which is invalid.
Option D multiplies instead of dividing, giving wrong scale.
Given this custom metric function:
def metric(predictions, references):
return sum(p == r for p, r in zip(predictions, references)) / len(predictions)What error will occur if predictions is an empty list and references is non-empty?
def metric(predictions, references): return sum(p == r for p, r in zip(predictions, references)) / len(predictions) result = metric([], ['a', 'b'])
Check what happens when dividing by the length of an empty list.
Since len(predictions) is zero, dividing by zero causes a ZeroDivisionError.
Which reason best explains why you might create a custom evaluation metric instead of using built-in ones in Langchain?
Think about why general metrics might not fit every use case.
Custom metrics let you measure exactly what matters for your task, beyond generic built-in metrics.
Built-in metrics are not always inaccurate, and Langchain does not require custom metrics for all models.
score after running this custom metric?Consider this code snippet used in Langchain to evaluate predictions:
class CustomMetric:
def __init__(self):
self.correct = 0
self.total = 0
def update(self, predictions, references):
for p, r in zip(predictions, references):
if p == r:
self.correct += 1
self.total += 1
def compute(self):
return self.correct / self.total if self.total > 0 else 0
metric = CustomMetric()
metric.update(['a', 'b'], ['a', 'x'])
metric.update(['c'], ['c'])
score = metric.compute()What is the value of score?
class CustomMetric: def __init__(self): self.correct = 0 self.total = 0 def update(self, predictions, references): for p, r in zip(predictions, references): if p == r: self.correct += 1 self.total += 1 def compute(self): return self.correct / self.total if self.total > 0 else 0 metric = CustomMetric() metric.update(['a', 'b'], ['a', 'x']) metric.update(['c'], ['c']) score = metric.compute()
Count total matches and total items after both updates.
First update: matches 'a'=='a' (1), total 2.
Second update: matches 'c'=='c' (1), total 1.
Total correct = 2, total = 3, so score = 2/3 = 0.666...