Experiment - RAG evaluation metrics
Problem:You have a Retrieval-Augmented Generation (RAG) model that combines retrieved documents with a generative model to answer questions. Currently, you want to evaluate how well the model answers questions using standard metrics.
Current Metrics:Exact Match (EM): 55%, F1 Score: 62%, Rouge-L: 58%
Issue:The evaluation metrics are moderate, but you want to improve the evaluation process by adding more comprehensive metrics and ensuring the code correctly computes them.