0
0
LangChainframework~10 mins

Custom evaluation metrics in LangChain - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Custom evaluation metrics
Define metric function
Integrate metric into evaluation
Run evaluation with metric
Collect metric results
Analyze and use results
This flow shows how to create a custom metric, add it to LangChain evaluation, run it, and get results.
Execution Sample
LangChain
def custom_metric(prediction, reference):
    return 1 if prediction == reference else 0

from langchain.evaluation import Evaluation

result = Evaluation.evaluate(
    predictions=["yes", "no"],
    references=["yes", "yes"],
    metrics=[custom_metric]
)
Defines a simple metric that checks exact match, then runs evaluation with predictions and references.
Execution Table
StepActionInputMetric ResultNotes
1Call custom_metricprediction='yes', reference='yes'1Exact match returns 1
2Call custom_metricprediction='no', reference='yes'0Mismatch returns 0
3Aggregate results[1, 0]Average=0.5Average metric score computed
4Return evaluation resultmetrics=[custom_metric]0.5Final evaluation output
5End--Evaluation complete
💡 All predictions processed; evaluation returns average metric score
Variable Tracker
VariableStartAfter 1After 2Final
prediction-'yes''no'-
reference-'yes''yes'-
metric_result-10[1, 0]
final_score---0.5
Key Moments - 2 Insights
Why does the metric return 1 or 0 instead of a percentage?
The metric function returns 1 for exact match and 0 otherwise, so results are binary per item. The average is computed later as the overall score (see execution_table rows 1-3).
How does LangChain use the custom metric function?
LangChain calls the custom metric on each prediction-reference pair, collects results, then aggregates them (execution_table rows 1-3).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the metric result when prediction='no' and reference='yes'?
A1
B0
C0.5
DUndefined
💡 Hint
Check step 2 in the execution_table where prediction='no' and reference='yes'
At which step does the evaluation compute the average metric score?
AStep 1
BStep 2
CStep 3
DStep 4
💡 Hint
Look for 'Aggregate results' in the execution_table
If the predictions were all correct, what would the final_score be?
A1
B0.5
C0
DCannot tell
💡 Hint
Check variable_tracker final_score when all metric_results are 1
Concept Snapshot
Custom evaluation metrics in LangChain:
- Define a function taking prediction and reference
- Return a numeric score (e.g., 1 for match, 0 for no)
- Pass function in metrics list to Evaluation.evaluate
- LangChain runs metric on each pair and aggregates
- Use results to understand model performance
Full Transcript
This visual execution shows how to create and use custom evaluation metrics in LangChain. First, you define a metric function that compares a prediction to a reference and returns a score. Then, you pass this function to LangChain's Evaluation.evaluate method along with lists of predictions and references. LangChain calls your metric on each pair, collects the results, and computes an average score. The execution table traces each call and the aggregation step. The variable tracker shows how values change during evaluation. Key moments clarify why the metric returns 1 or 0 and how LangChain uses it. The quiz tests understanding of metric results and aggregation. This helps beginners see step-by-step how custom metrics work in LangChain evaluation.