0
0
LangChainframework~10 mins

LangSmith evaluators in LangChain - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - LangSmith evaluators
Input: Model Output + Reference
Evaluator Receives Data
Run Evaluation Logic
Generate Score or Feedback
Return Evaluation Result
The evaluator takes the model output and reference, runs evaluation logic, and returns a score or feedback.
Execution Sample
LangChain
from langchain.evaluation import StringEvaluator

eval = StringEvaluator()
result = eval.evaluate_string(
    prediction="Hello world",
    reference="Hello world!"
)
This code creates a StringEvaluator and evaluates a prediction against a reference string.
Execution Table
StepActionInputEvaluation LogicOutput
1Create StringEvaluator instanceNoneInitialize evaluatorEvaluator ready
2Call evaluate_stringprediction='Hello world', reference='Hello world!'Compare strings with toleranceScore calculated
3Return evaluation resultScore calculatedFormat result{"score": 0.9, "feedback": "Close match"}
4EndEvaluation completeNo further actionEvaluation finished
💡 Evaluation completes after returning the score and feedback.
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3Final
evalNoneStringEvaluator instanceStringEvaluator instanceStringEvaluator instanceStringEvaluator instance
predictionNoneNone"Hello world""Hello world""Hello world"
referenceNoneNone"Hello world!""Hello world!""Hello world!"
resultNoneNoneNone{"score": 0.9, "feedback": "Close match"}{"score": 0.9, "feedback": "Close match"}
Key Moments - 2 Insights
Why does the evaluator return a score less than 1 even though the prediction looks very similar to the reference?
Because the evaluation logic compares strings exactly or with some tolerance, the missing exclamation mark causes a slight difference, resulting in a score less than 1 as shown in step 2 and 3 of the execution_table.
What happens if the prediction or reference is None or empty?
The evaluator will still run but may return a low or zero score because the comparison logic expects strings. This is implied in the input column of step 2 where valid strings are required.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the output after step 3?
A{"score": 0.9, "feedback": "Close match"}
BEvaluator ready
CScore calculated
DEvaluation finished
💡 Hint
Check the Output column in row with Step 3.
At which step does the evaluator compare the prediction and reference strings?
AStep 1
BStep 3
CStep 2
DStep 4
💡 Hint
Look at the Evaluation Logic column to find where comparison happens.
If the prediction was exactly the same as the reference, how would the score change in the execution_table?
AScore would be 0
BScore would be 1
CScore would be 0.5
DScore would be negative
💡 Hint
Consider the meaning of a perfect match in evaluation logic at step 2.
Concept Snapshot
LangSmith evaluators compare model outputs to references.
They run evaluation logic to produce scores or feedback.
Use evaluator methods like evaluate_string for text.
Outputs help measure model accuracy or quality.
Scores range from 0 (bad) to 1 (perfect match).
Full Transcript
LangSmith evaluators are tools that check how well a model's output matches a reference answer. The process starts by giving the evaluator the model's prediction and the correct reference. The evaluator then runs its logic to compare these two inputs. This comparison results in a score and sometimes feedback explaining the quality. For example, a StringEvaluator compares text strings and returns a score close to 1 if they are very similar. The evaluation ends by returning this score and feedback. This helps developers understand how accurate or good their model's outputs are.