Challenge - 5 Problems
LangSmith Evaluator Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ component_behavior
intermediate2:00remaining
What is the output of this LangSmith evaluator code?
Consider this LangSmith evaluator snippet that scores a model's response based on keyword presence. What score does it produce?
LangChain
from langsmith.evaluation import Evaluator class KeywordEvaluator(Evaluator): def evaluate(self, prediction: str, reference: str) -> float: keywords = reference.split() score = sum(1 for kw in keywords if kw in prediction) / len(keywords) return score # Usage evaluator = KeywordEvaluator() prediction = "The quick brown fox jumps" reference = "quick fox jumps high" result = evaluator.evaluate(prediction, reference) print(result)
Attempts:
2 left
💡 Hint
Count how many keywords from the reference appear in the prediction, then divide by total keywords.
✗ Incorrect
The reference has 4 keywords: 'quick', 'fox', 'jumps', 'high'. The prediction contains 'quick', 'fox', and 'jumps' but not 'high'. So 3 out of 4 keywords match, resulting in 0.75.
📝 Syntax
intermediate2:00remaining
Which option causes a syntax error in defining a LangSmith evaluator?
Identify the code snippet that will raise a syntax error when defining a custom LangSmith evaluator class.
Attempts:
2 left
💡 Hint
Check the function parameter type annotations carefully.
✗ Incorrect
Option A misses a colon between 'prediction' and 'str' in the parameter list, causing a syntax error.
❓ state_output
advanced2:00remaining
What is the value of 'score' after running this LangSmith evaluator code?
Given this evaluator code that uses a weighted scoring system, what is the final score returned?
LangChain
from langsmith.evaluation import Evaluator class WeightedEvaluator(Evaluator): def evaluate(self, prediction: str, reference: str) -> float: weights = {'good': 2, 'bad': -1} score = 0 for word in prediction.split(): score += weights.get(word, 0) return score # Usage result = WeightedEvaluator().evaluate('good good bad unknown', 'reference')
Attempts:
2 left
💡 Hint
Add weights for each word in prediction: 'good' counts 2, 'bad' counts -1, unknown counts 0.
✗ Incorrect
The prediction has two 'good' words (2*2=4), one 'bad' (-1), and one 'unknown' (0). Total score = 4 - 1 + 0 = 3.
🔧 Debug
advanced2:00remaining
Which option causes a runtime error when using LangSmith evaluator?
Identify the code snippet that will raise a runtime error during evaluation.
Attempts:
2 left
💡 Hint
Check for division by zero errors.
✗ Incorrect
Option C divides by len(reference), which is zero, causing a ZeroDivisionError at runtime.
🧠 Conceptual
expert2:00remaining
Which option best describes the role of LangSmith evaluators in model development?
Select the statement that correctly explains what LangSmith evaluators do.
Attempts:
2 left
💡 Hint
Think about evaluation and scoring roles in model workflows.
✗ Incorrect
LangSmith evaluators are designed to score model outputs by comparing them to references, helping improve model quality.