What is Custom evaluation metrics in LangChain?

LangChainframework~5 mins

Custom evaluation metrics in LangChain

Choose your learning style10 modes available

Learn Why Deep Visual Try Challenge Project Recall Perf

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

Custom evaluation metrics help you measure how well your AI or language model is doing in ways that matter most to your project.

When you want to check if your AI answers are accurate for your specific topic.

When default scores don't show the full picture of your model's performance.

When you need to compare different AI models using your own rules.

When you want to track improvements based on your unique goals.

When you want to give feedback to your AI system in a way that fits your needs.

Syntax

LangChain

from langchain.evaluation import Evaluation

class MyMetric(Evaluation):
    def evaluate(self, prediction: str, reference: str) -> float:
        # Your custom logic here
        score = 0.0
        return score

Create a class that inherits from Evaluation.

Implement the evaluate method to return a numeric score.

Examples

This metric returns 1 if the prediction exactly matches the reference, otherwise 0.

LangChain

from langchain.evaluation import Evaluation

class ExactMatch(Evaluation):
    def evaluate(self, prediction: str, reference: str) -> float:
        return 1.0 if prediction == reference else 0.0

This metric scores higher when the prediction length is closer to the reference length.

LangChain

from langchain.evaluation import Evaluation

class LengthDifference(Evaluation):
    def evaluate(self, prediction: str, reference: str) -> float:
        return 1.0 / (1 + abs(len(prediction) - len(reference)))

Sample Program

This example defines a simple similarity metric that compares how many words overlap between prediction and reference. It then prints the similarity score.

LangChain

from langchain.evaluation import Evaluation

class SimpleSimilarity(Evaluation):
    def evaluate(self, prediction: str, reference: str) -> float:
        pred_words = set(prediction.lower().split())
        ref_words = set(reference.lower().split())
        common = pred_words.intersection(ref_words)
        total = pred_words.union(ref_words)
        return len(common) / len(total) if total else 0.0

# Example usage
metric = SimpleSimilarity()
pred = "The quick brown fox"
ref = "The quick fox jumps"
score = metric.evaluate(pred, ref)
print(f"Similarity score: {score:.2f}")

OutputSuccess

Important Notes

Custom metrics should return a number, usually between 0 and 1, where higher means better.

Keep your metric logic simple and fast for better performance.

Test your metric with different inputs to make sure it behaves as expected.

Summary

Custom evaluation metrics let you measure AI results in your own way.

Define a class inheriting from Evaluation and implement evaluate.

Use your metric to get scores that help improve your AI models.

Practice

(1/5)

1. What is the main purpose of creating a custom evaluation metric in Langchain?

easy

A. To speed up the AI model training process

B. To measure AI results in a way that fits your specific needs

C. To automatically fix errors in AI outputs

D. To replace the AI model with a simpler one

Custom evaluation metrics in LangChain

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of evaluation metrics

Step 2: Identify why custom metrics are used

Final Answer:

Quick Check:

Solution

Step 1: Recall Langchain class inheritance syntax

Step 2: Identify correct class definition

Final Answer:

Quick Check:

Solution

Step 1: Understand the evaluate method logic

Step 2: Apply inputs to the method

Final Answer:

Quick Check:

Solution

Step 1: Analyze the evaluate method with empty references

Step 2: Identify the runtime error cause

Final Answer:

Quick Check:

Solution

Step 1: Understand the goal of keyword-based scoring

Step 2: Identify the approach that measures keyword presence proportionally

Final Answer:

Quick Check: