When combining retrieved context with a large language model (LLM), the key metric is accuracy or relevance of the model's output. This is because the goal is to produce answers that correctly use the retrieved information. Metrics like precision and recall help measure how well the model uses the right context without adding wrong or irrelevant details.
For example, if the model retrieves documents to answer a question, precision measures how many retrieved facts are actually correct in the answer, while recall measures how many correct facts from the documents are included. Balancing these ensures the LLM output is both accurate and complete.