0
0
LangChainframework~20 mins

Custom evaluation metrics in LangChain - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Custom Evaluation Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
component_behavior
intermediate
2:00remaining
What output does this custom metric function produce?

Consider this Python function used as a custom evaluation metric in Langchain:

def custom_metric(predictions, references):
    correct = sum(p == r for p, r in zip(predictions, references))
    total = len(references)
    return correct / total if total > 0 else 0

What is the output of custom_metric(['a', 'b', 'c'], ['a', 'x', 'c'])?

LangChain
def custom_metric(predictions, references):
    correct = sum(p == r for p, r in zip(predictions, references))
    total = len(references)
    return correct / total if total > 0 else 0

result = custom_metric(['a', 'b', 'c'], ['a', 'x', 'c'])
A0
B0.3333333333333333
C0.6666666666666666
D1.0
Attempts:
2 left
💡 Hint

Count how many predictions match the references exactly, then divide by total items.

📝 Syntax
intermediate
2:00remaining
Which option correctly defines a custom metric function in Langchain?

Which of the following Python functions correctly defines a custom evaluation metric that returns the ratio of matching items between predictions and references?

A
def metric(predictions, references):
    return sum(p == r for p, r in zip(predictions, references)) * len(references)
B
def metric(predictions, references):
    return sum(p == r for p in predictions for r in references) / len(references)
C
def metric(predictions, references):
    return sum(predictions == references) / len(references)
D
def metric(predictions, references):
    return sum(p == r for p, r in zip(predictions, references)) / len(references)
Attempts:
2 left
💡 Hint

Use zip to pair predictions and references correctly.

🔧 Debug
advanced
2:00remaining
What error does this custom metric code raise?

Given this custom metric function:

def metric(predictions, references):
    return sum(p == r for p, r in zip(predictions, references)) / len(predictions)

What error will occur if predictions is an empty list and references is non-empty?

LangChain
def metric(predictions, references):
    return sum(p == r for p, r in zip(predictions, references)) / len(predictions)

result = metric([], ['a', 'b'])
AZeroDivisionError
BIndexError
CTypeError
DNo error, returns 0
Attempts:
2 left
💡 Hint

Check what happens when dividing by the length of an empty list.

🧠 Conceptual
advanced
2:00remaining
Why use custom evaluation metrics in Langchain?

Which reason best explains why you might create a custom evaluation metric instead of using built-in ones in Langchain?

ATo measure specific qualities of your model's output that built-in metrics don't capture
BBecause built-in metrics are always inaccurate and unreliable
CBecause Langchain requires custom metrics for all models
DTo make the evaluation run faster by avoiding built-in functions
Attempts:
2 left
💡 Hint

Think about why general metrics might not fit every use case.

state_output
expert
3:00remaining
What is the final value of score after running this custom metric?

Consider this code snippet used in Langchain to evaluate predictions:

class CustomMetric:
    def __init__(self):
        self.correct = 0
        self.total = 0
    def update(self, predictions, references):
        for p, r in zip(predictions, references):
            if p == r:
                self.correct += 1
            self.total += 1
    def compute(self):
        return self.correct / self.total if self.total > 0 else 0

metric = CustomMetric()
metric.update(['a', 'b'], ['a', 'x'])
metric.update(['c'], ['c'])
score = metric.compute()

What is the value of score?

LangChain
class CustomMetric:
    def __init__(self):
        self.correct = 0
        self.total = 0
    def update(self, predictions, references):
        for p, r in zip(predictions, references):
            if p == r:
                self.correct += 1
            self.total += 1
    def compute(self):
        return self.correct / self.total if self.total > 0 else 0

metric = CustomMetric()
metric.update(['a', 'b'], ['a', 'x'])
metric.update(['c'], ['c'])
score = metric.compute()
A0.5
B0.6666666666666666
C1.0
D0.0
Attempts:
2 left
💡 Hint

Count total matches and total items after both updates.