0
0
LangChainframework~20 mins

LangSmith evaluators in LangChain - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
LangSmith Evaluator Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
component_behavior
intermediate
2:00remaining
What is the output of this LangSmith evaluator code?
Consider this LangSmith evaluator snippet that scores a model's response based on keyword presence. What score does it produce?
LangChain
from langsmith.evaluation import Evaluator

class KeywordEvaluator(Evaluator):
    def evaluate(self, prediction: str, reference: str) -> float:
        keywords = reference.split()
        score = sum(1 for kw in keywords if kw in prediction) / len(keywords)
        return score

# Usage
evaluator = KeywordEvaluator()
prediction = "The quick brown fox jumps"
reference = "quick fox jumps high"
result = evaluator.evaluate(prediction, reference)
print(result)
A0.5
B0.75
C1.0
D0.25
Attempts:
2 left
💡 Hint
Count how many keywords from the reference appear in the prediction, then divide by total keywords.
📝 Syntax
intermediate
2:00remaining
Which option causes a syntax error in defining a LangSmith evaluator?
Identify the code snippet that will raise a syntax error when defining a custom LangSmith evaluator class.
A
class MyEvaluator(Evaluator):
    def evaluate(self, prediction str, reference: str) -> float:
        return 1.0
B
class MyEvaluator(Evaluator):
    def evaluate(self, prediction: str, reference: str):
        return 1.0
C
class MyEvaluator(Evaluator):
    def evaluate(self, prediction: str, reference: str) -> float:
        return 1.0
D
0.1 nruter        
:taolf >- )rts :ecnerefer ,rts :noitciderp ,fles(etaulave fed    
:)rotaulavE(rotaulavEyM ssalc
Attempts:
2 left
💡 Hint
Check the function parameter type annotations carefully.
state_output
advanced
2:00remaining
What is the value of 'score' after running this LangSmith evaluator code?
Given this evaluator code that uses a weighted scoring system, what is the final score returned?
LangChain
from langsmith.evaluation import Evaluator

class WeightedEvaluator(Evaluator):
    def evaluate(self, prediction: str, reference: str) -> float:
        weights = {'good': 2, 'bad': -1}
        score = 0
        for word in prediction.split():
            score += weights.get(word, 0)
        return score

# Usage
result = WeightedEvaluator().evaluate('good good bad unknown', 'reference')
A3
B0
C1
D2
Attempts:
2 left
💡 Hint
Add weights for each word in prediction: 'good' counts 2, 'bad' counts -1, unknown counts 0.
🔧 Debug
advanced
2:00remaining
Which option causes a runtime error when using LangSmith evaluator?
Identify the code snippet that will raise a runtime error during evaluation.
A
class Eval(Evaluator):
    def evaluate(self, prediction: str, reference: str) -> float:
        return len(prediction) * len(reference)

Eval().evaluate('test', '')
B
class Eval(Evaluator):
    def evaluate(self, prediction: str, reference: str) -> float:
        return len(prediction) + len(reference)

Eval().evaluate('test', '')
C
class Eval(Evaluator):
    def evaluate(self, prediction: str, reference: str) -> float:
        return len(prediction) / len(reference)

Eval().evaluate('test', '')
D
class Eval(Evaluator):
    def evaluate(self, prediction: str, reference: str) -> float:
        return len(prediction) - len(reference)

Eval().evaluate('test', '')
Attempts:
2 left
💡 Hint
Check for division by zero errors.
🧠 Conceptual
expert
2:00remaining
Which option best describes the role of LangSmith evaluators in model development?
Select the statement that correctly explains what LangSmith evaluators do.
AThey automatically generate training data for language models without human input.
BThey convert language models into different programming languages for compatibility.
CThey deploy language models to production environments with optimized latency.
DThey provide a way to score and assess model outputs against references to improve model quality.
Attempts:
2 left
💡 Hint
Think about evaluation and scoring roles in model workflows.