0
0
NLPml~12 mins

Open-domain QA basics in NLP - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - Open-domain QA basics

This pipeline answers questions by searching a large collection of documents and then selecting the best answer. It first finds relevant text pieces, then reads them carefully to find the exact answer.

Data Flow - 5 Stages
1Input Question
1 question stringReceive a natural language question from user1 question string
"What is the capital of France?"
2Document Retrieval
1 question stringSearch a large text database to find top relevant documents or passages5 passages x 100 words each
["Paris is the capital city of France...", "France's largest city is Paris..."]
3Context Preparation
5 passages x 100 words eachCombine retrieved passages into a single context for reading1 context string (~500 words)
"Paris is the capital city of France. It is known for... France's largest city is Paris..."
4Answer Extraction Model
1 question string + 1 context stringUse a reading comprehension model to find answer span in context1 answer string
"Paris"
5Output Answer
1 answer stringReturn the extracted answer to the user1 answer string
"Paris"
Training Trace - Epoch by Epoch

Loss
1.2 |****
1.0 |*** 
0.8 |**  
0.6 |*   
0.4 |    
     +----
      1 2 3 4 5 Epochs
EpochLoss ↓Accuracy ↑Observation
11.20.45Model starts learning to locate answers in text.
20.90.60Model improves understanding of question and context.
30.70.72Model better identifies correct answer spans.
40.50.80Model converges with good answer extraction ability.
50.40.85Final fine-tuning improves accuracy slightly.
Prediction Trace - 5 Layers
Layer 1: Input Question
Layer 2: Document Retrieval
Layer 3: Context Preparation
Layer 4: Answer Extraction Model
Layer 5: Output Answer
Model Quiz - 3 Questions
Test your understanding
What is the main role of the Document Retrieval stage?
AGenerate the final answer directly
BFind relevant text passages related to the question
CCombine passages into one context
DTrain the model to understand questions
Key Insight
Open-domain QA works by first finding relevant information, then carefully reading it to pick the exact answer. Training improves the model's ability to locate answers, shown by decreasing loss and increasing accuracy.