In conversation management, the key metrics are Precision, Recall, and F1 score. These help us understand how well the system understands and responds correctly to user inputs.
Precision tells us how many of the system's responses were actually correct and relevant.
Recall tells us how many of the user intents or questions the system successfully recognized and answered.
F1 score balances precision and recall to give a single measure of overall performance.
We focus on these because a conversation system should avoid giving wrong answers (high precision) and also avoid missing user requests (high recall).