An AI agent answers 100 questions. It correctly answers 80 questions but also gives 20 wrong answers. What is the precision of the agent?
Precision is the number of correct answers divided by total answers given.
Precision = Correct answers / (Correct + Wrong answers) = 80 / (80 + 20) = 0.8
Which statement best describes the difference between accuracy and relevance when measuring an AI agent's output?
Think about correctness versus usefulness.
Accuracy is about correct answers. Relevance is about how well the answer fits the question's intent.
Given the following code calculating precision and recall, what is the printed F1 score?
correct = 70 wrong = 30 relevant = 80 precision = correct / (correct + wrong) recall = correct / relevant f1_score = 2 * (precision * recall) / (precision + recall) print(round(f1_score, 2))
Calculate precision and recall first, then use the F1 formula.
Precision = 70/(70+30) = 0.7; Recall = 70/80 = 0.875; F1 = 2*(0.7*0.875)/(0.7+0.875) ≈ 0.78
What error does the following code produce when calculating accuracy?
correct = 50 wrong = 20 accuracy = correct / wrong print(accuracy)
correct = 50 wrong = 20 accuracy = correct / wrong print(accuracy)
Think about how accuracy is calculated.
Accuracy should be correct / (correct + wrong), but code divides correct by wrong, giving 2.5 which is not a valid accuracy.
You want to measure how well an AI agent's answers match the user's intent, focusing on usefulness rather than just correctness. Which metric is best suited for this?
Consider metrics that evaluate ranking or relevance rather than exact correctness.
MRR measures how high relevant answers appear in a ranked list, capturing usefulness and relevance better than simple accuracy or precision.