When we measure how well an agent performs, two key ideas matter: accuracy and relevance.
Accuracy tells us how often the agent's answers or actions are correct. It is important because it shows if the agent is reliable.
Relevance shows if the agent's responses fit the user's needs or questions well. Even if an answer is correct, it might not be useful if it is not relevant.
To measure these, we use metrics like Precision, Recall, and F1 score. Precision tells us how many of the agent's positive answers were truly correct. Recall tells us how many of the true correct answers the agent found. F1 score balances both.
For agents, relevance can also be measured by user feedback or similarity scores comparing the agent's output to expected results.