Complete the code to create an evaluation that checks if the output matches the expected answer.
from langchain.evaluation import StringEvaluator evaluator = StringEvaluator() result = evaluator.evaluate(output=[1], reference="Hello World")
The evaluate method requires the actual output to compare with the reference. Here, output is the correct variable holding the generated result.
Complete the code to run an evaluation on a chain's output using LangChain's evaluation tools.
from langchain.chains import LLMChain from langchain.evaluation import StringEvaluator chain = LLMChain(llm=llm, prompt=prompt) output = chain.run("What is 2 + 2?") evaluator = StringEvaluator() score = evaluator.evaluate(output=[1], reference="4")
The variable output holds the chain's generated answer, which is what the evaluator needs to check against the reference.
Fix the error in the evaluation code by completing the blank with the correct method to get the chain's output.
output = chain.[1]("Calculate 5 times 3") score = evaluator.evaluate(output=output, reference="15")
execute or process.call with run.The run method is the correct way to execute the chain and get its output as a string.
Fill both blanks to create a dictionary comprehension that evaluates outputs only if the score is above a threshold.
results = {output: evaluator.evaluate(output=output, reference=ref) for output, ref in outputs.items() if evaluator.evaluate(output=output, reference=ref) [1] [2]The comprehension filters results where the evaluation score is greater than 0.8, meaning only good matches are included.
Fill all three blanks to define a function that evaluates a list of outputs and returns those with scores above a threshold.
def filter_good_outputs(outputs, evaluator, threshold): return {output: score for output in outputs if (score := evaluator.evaluate(output=output, reference=outputs[output])) [1] threshold and score [2] 1 and output [3] outputs}
The function keeps outputs with scores greater than the threshold and less than or equal to 1, ensuring valid scores. It also checks that the output is a key in the outputs dictionary.