0
0
Prompt Engineering / GenAIml~12 mins

RAG evaluation metrics in Prompt Engineering / GenAI - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - RAG evaluation metrics

This pipeline shows how a Retrieval-Augmented Generation (RAG) model processes input text by retrieving relevant documents, generating answers, and then evaluating the quality of those answers using specific metrics.

Data Flow - 4 Stages
1Input Text
1 sample x 1 text stringUser provides a question or prompt1 sample x 1 text string
"What is the capital of France?"
2Document Retrieval
1 sample x 1 text stringRetrieve top relevant documents from knowledge base1 sample x 5 documents (text)
["Paris is the capital of France.", "France is in Europe.", "The Eiffel Tower is in Paris.", "Paris has many museums.", "French cuisine is famous."]
3Answer Generation
1 sample x 5 documentsGenerate answer using retrieved documents and input question1 sample x 1 generated answer string
"The capital of France is Paris."
4Evaluation Metrics Calculation
1 sample x 1 generated answer string + 1 reference answer stringCalculate metrics like Exact Match, F1 Score, and Rouge-L1 sample x 3 metric scores
{"Exact Match": 1.0, "F1 Score": 1.0, "ROUGE-L": 0.85}
Training Trace - Epoch by Epoch

Loss
1.0 |***************
0.8 |************   
0.6 |********      
0.4 |******        
0.2 |***           
0.0 +-------------
     1 2 3 4 5 Epochs
EpochLoss ↓Accuracy ↑Observation
10.850.60Initial training with moderate loss and accuracy.
20.650.72Loss decreased, accuracy improved as model learns retrieval and generation.
30.500.80Better alignment between retrieved documents and generated answers.
40.400.85Model shows strong retrieval and generation performance.
50.350.88Training converges with high accuracy and low loss.
Prediction Trace - 4 Layers
Layer 1: Input Question
Layer 2: Document Retrieval
Layer 3: Answer Generation
Layer 4: Evaluation Metrics Calculation
Model Quiz - 3 Questions
Test your understanding
Which stage reduces the input from one question to multiple documents?
AEvaluation Metrics Calculation
BAnswer Generation
CDocument Retrieval
DInput Text
Key Insight
RAG models combine retrieval and generation steps, and their evaluation metrics like Exact Match and F1 Score help measure how well the generated answers match reference answers. Training shows steady improvement as the model learns to retrieve relevant documents and generate accurate responses.