Bird
Raised Fist0
LangChainframework~20 mins

LangSmith evaluators in LangChain - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
LangSmith Evaluator Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
component_behavior
intermediate
2:00remaining
What is the output of this LangSmith evaluator code?
Consider this LangSmith evaluator snippet that scores a model's response based on keyword presence. What score does it produce?
LangChain
from langsmith.evaluation import Evaluator

class KeywordEvaluator(Evaluator):
    def evaluate(self, prediction: str, reference: str) -> float:
        keywords = reference.split()
        score = sum(1 for kw in keywords if kw in prediction) / len(keywords)
        return score

# Usage
evaluator = KeywordEvaluator()
prediction = "The quick brown fox jumps"
reference = "quick fox jumps high"
result = evaluator.evaluate(prediction, reference)
print(result)
A0.5
B0.75
C1.0
D0.25
Attempts:
2 left
💡 Hint
Count how many keywords from the reference appear in the prediction, then divide by total keywords.
📝 Syntax
intermediate
2:00remaining
Which option causes a syntax error in defining a LangSmith evaluator?
Identify the code snippet that will raise a syntax error when defining a custom LangSmith evaluator class.
A
class MyEvaluator(Evaluator):
    def evaluate(self, prediction str, reference: str) -> float:
        return 1.0
B
class MyEvaluator(Evaluator):
    def evaluate(self, prediction: str, reference: str):
        return 1.0
C
class MyEvaluator(Evaluator):
    def evaluate(self, prediction: str, reference: str) -> float:
        return 1.0
D
0.1 nruter        
:taolf >- )rts :ecnerefer ,rts :noitciderp ,fles(etaulave fed    
:)rotaulavE(rotaulavEyM ssalc
Attempts:
2 left
💡 Hint
Check the function parameter type annotations carefully.
state_output
advanced
2:00remaining
What is the value of 'score' after running this LangSmith evaluator code?
Given this evaluator code that uses a weighted scoring system, what is the final score returned?
LangChain
from langsmith.evaluation import Evaluator

class WeightedEvaluator(Evaluator):
    def evaluate(self, prediction: str, reference: str) -> float:
        weights = {'good': 2, 'bad': -1}
        score = 0
        for word in prediction.split():
            score += weights.get(word, 0)
        return score

# Usage
result = WeightedEvaluator().evaluate('good good bad unknown', 'reference')
A3
B0
C1
D2
Attempts:
2 left
💡 Hint
Add weights for each word in prediction: 'good' counts 2, 'bad' counts -1, unknown counts 0.
🔧 Debug
advanced
2:00remaining
Which option causes a runtime error when using LangSmith evaluator?
Identify the code snippet that will raise a runtime error during evaluation.
A
class Eval(Evaluator):
    def evaluate(self, prediction: str, reference: str) -> float:
        return len(prediction) * len(reference)

Eval().evaluate('test', '')
B
class Eval(Evaluator):
    def evaluate(self, prediction: str, reference: str) -> float:
        return len(prediction) + len(reference)

Eval().evaluate('test', '')
C
class Eval(Evaluator):
    def evaluate(self, prediction: str, reference: str) -> float:
        return len(prediction) / len(reference)

Eval().evaluate('test', '')
D
class Eval(Evaluator):
    def evaluate(self, prediction: str, reference: str) -> float:
        return len(prediction) - len(reference)

Eval().evaluate('test', '')
Attempts:
2 left
💡 Hint
Check for division by zero errors.
🧠 Conceptual
expert
2:00remaining
Which option best describes the role of LangSmith evaluators in model development?
Select the statement that correctly explains what LangSmith evaluators do.
AThey automatically generate training data for language models without human input.
BThey convert language models into different programming languages for compatibility.
CThey deploy language models to production environments with optimized latency.
DThey provide a way to score and assess model outputs against references to improve model quality.
Attempts:
2 left
💡 Hint
Think about evaluation and scoring roles in model workflows.

Practice

(1/5)
1. What is the main purpose of LangSmith evaluators in LangChain?
easy
A. To check how good AI outputs are by comparing predictions to references
B. To train new AI models from scratch
C. To store large datasets for AI training
D. To create user interfaces for AI applications

Solution

  1. Step 1: Understand the role of evaluators

    LangSmith evaluators are designed to assess AI outputs by comparing them with expected answers.
  2. Step 2: Identify the correct purpose

    They do not train models, store data, or build interfaces but focus on evaluation.
  3. Final Answer:

    To check how good AI outputs are by comparing predictions to references -> Option A
  4. Quick Check:

    Evaluator purpose = Checking AI output quality [OK]
Hint: Evaluators compare AI answers to references to check quality [OK]
Common Mistakes:
  • Confusing evaluators with training tools
  • Thinking evaluators store data
  • Assuming evaluators build UI
2. Which of the following is the correct way to call an evaluator's evaluate method in LangSmith?
easy
A. evaluate(evaluator, prediction, reference)
B. evaluator.evaluate(prediction, reference)
C. evaluator.run(reference, prediction)
D. evaluate(prediction, reference, evaluator)

Solution

  1. Step 1: Recall method usage

    The evaluate method is called on the evaluator object with prediction and reference as arguments.
  2. Step 2: Match correct syntax

    evaluator.evaluate(prediction, reference) matches this pattern exactly: evaluator.evaluate(prediction, reference).
  3. Final Answer:

    evaluator.evaluate(prediction, reference) -> Option B
  4. Quick Check:

    Method call = evaluator.evaluate(prediction, reference) [OK]
Hint: Call evaluate on evaluator with prediction and reference [OK]
Common Mistakes:
  • Swapping argument order
  • Calling evaluate as a standalone function
  • Using wrong method name like run
3. Given the code snippet:
evaluator = SomeEvaluator()
prediction = "The sky is blue."
reference = "The sky is clear and blue."
result = evaluator.evaluate(prediction, reference)
print(result)

What is the expected behavior of print(result)?
medium
A. It prints the reference string unchanged
B. It prints the prediction string unchanged
C. It prints a score or feedback comparing prediction to reference
D. It raises a syntax error because evaluate needs more arguments

Solution

  1. Step 1: Understand evaluate output

    The evaluate method returns a score or feedback about how close the prediction matches the reference.
  2. Step 2: Analyze print statement

    Printing result shows this evaluation output, not the original strings or errors.
  3. Final Answer:

    It prints a score or feedback comparing prediction to reference -> Option C
  4. Quick Check:

    Evaluate returns score/feedback [OK]
Hint: Evaluate returns comparison result, not original text [OK]
Common Mistakes:
  • Expecting evaluate to return input strings
  • Thinking evaluate raises error without extra args
  • Confusing prediction and reference outputs
4. What is the error in this code snippet?
evaluator = SomeEvaluator()
result = evaluator.evaluate(reference, prediction)
print(result)

Assuming evaluate expects (prediction, reference) order.
medium
A. Arguments are reversed; prediction should come before reference
B. Missing import statement for SomeEvaluator
C. evaluate method does not exist on evaluator
D. print statement syntax is incorrect

Solution

  1. Step 1: Check argument order

    The evaluate method expects prediction first, then reference, but code reverses them.
  2. Step 2: Confirm other parts are correct

    Assuming SomeEvaluator is imported and evaluate exists, the main issue is argument order.
  3. Final Answer:

    Arguments are reversed; prediction should come before reference -> Option A
  4. Quick Check:

    Correct argument order = prediction, reference [OK]
Hint: Remember evaluate(prediction, reference) argument order [OK]
Common Mistakes:
  • Swapping prediction and reference arguments
  • Assuming missing imports cause this error
  • Thinking print syntax is wrong
5. You want to compare multiple AI model outputs to a single reference answer using LangSmith evaluators. Which approach correctly applies evaluators to get scores for each prediction?
hard
A. Combine all predictions into one string and evaluate against reference once
B. Call evaluator.evaluate once with a list of predictions and one reference
C. Use evaluator.evaluate(reference, prediction) inside a loop over references
D. Loop over predictions, call evaluator.evaluate(prediction, reference) for each, collect results

Solution

  1. Step 1: Understand evaluator usage for multiple inputs

    Evaluators typically compare one prediction to one reference at a time.
  2. Step 2: Apply evaluator in a loop

    Looping over each prediction and calling evaluate separately gives individual scores.
  3. Step 3: Eliminate incorrect options

    Passing lists or combining strings is not standard; argument order matters.
  4. Final Answer:

    Loop over predictions, call evaluator.evaluate(prediction, reference) for each, collect results -> Option D
  5. Quick Check:

    Evaluate each prediction separately in a loop [OK]
Hint: Evaluate predictions one by one in a loop against reference [OK]
Common Mistakes:
  • Passing lists instead of single strings
  • Mixing argument order
  • Combining predictions into one string