0
0
LangChainframework~5 mins

Custom evaluation metrics in LangChain

Choose your learning style9 modes available
Introduction

Custom evaluation metrics help you measure how well your AI or language model is doing in ways that matter most to your project.

When you want to check if your AI answers are accurate for your specific topic.
When default scores don't show the full picture of your model's performance.
When you need to compare different AI models using your own rules.
When you want to track improvements based on your unique goals.
When you want to give feedback to your AI system in a way that fits your needs.
Syntax
LangChain
from langchain.evaluation import Evaluation

class MyMetric(Evaluation):
    def evaluate(self, prediction: str, reference: str) -> float:
        # Your custom logic here
        score = 0.0
        return score
Create a class that inherits from Evaluation.
Implement the evaluate method to return a numeric score.
Examples
This metric returns 1 if the prediction exactly matches the reference, otherwise 0.
LangChain
from langchain.evaluation import Evaluation

class ExactMatch(Evaluation):
    def evaluate(self, prediction: str, reference: str) -> float:
        return 1.0 if prediction == reference else 0.0
This metric scores higher when the prediction length is closer to the reference length.
LangChain
from langchain.evaluation import Evaluation

class LengthDifference(Evaluation):
    def evaluate(self, prediction: str, reference: str) -> float:
        return 1.0 / (1 + abs(len(prediction) - len(reference)))
Sample Program

This example defines a simple similarity metric that compares how many words overlap between prediction and reference. It then prints the similarity score.

LangChain
from langchain.evaluation import Evaluation

class SimpleSimilarity(Evaluation):
    def evaluate(self, prediction: str, reference: str) -> float:
        pred_words = set(prediction.lower().split())
        ref_words = set(reference.lower().split())
        common = pred_words.intersection(ref_words)
        total = pred_words.union(ref_words)
        return len(common) / len(total) if total else 0.0

# Example usage
metric = SimpleSimilarity()
pred = "The quick brown fox"
ref = "The quick fox jumps"
score = metric.evaluate(pred, ref)
print(f"Similarity score: {score:.2f}")
OutputSuccess
Important Notes

Custom metrics should return a number, usually between 0 and 1, where higher means better.

Keep your metric logic simple and fast for better performance.

Test your metric with different inputs to make sure it behaves as expected.

Summary

Custom evaluation metrics let you measure AI results in your own way.

Define a class inheriting from Evaluation and implement evaluate.

Use your metric to get scores that help improve your AI models.