0
0
NLPml~12 mins

Extractive QA concept in NLP - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - Extractive QA concept

This pipeline finds answers by picking exact text spans from a given passage based on a question. It reads the passage and question, then highlights the answer inside the passage.

Data Flow - 5 Stages
1Input Data
1000 samples x 2 texts (question, passage)Receive pairs of question and passage texts1000 samples x 2 texts (question, passage)
Question: 'Where is the Eiffel Tower located?'; Passage: 'The Eiffel Tower is in Paris, France, and is a famous landmark.'
2Tokenization
1000 samples x 2 textsSplit texts into tokens (words or subwords)1000 samples x 2 token lists (question tokens, passage tokens)
Question tokens: ['Where', 'is', 'the', 'Eiffel', 'Tower', 'located', '?']; Passage tokens: ['The', 'Eiffel', 'Tower', 'is', 'in', 'Paris', ',', 'France', ',', 'and', 'is', 'a', 'famous', 'landmark', '.']
3Input Encoding
1000 samples x 2 token listsConvert tokens to numerical vectors using embeddings1000 samples x 2 sequences of vectors (e.g., 768-dim)
Question vectors: [[0.1, 0.3, ...], ...]; Passage vectors: [[0.2, 0.4, ...], ...]
4Model Forward Pass
1000 samples x 2 sequences of vectorsUse a neural network (e.g., BERT) to predict start and end positions of answer in passage1000 samples x 2 probability distributions over passage tokens (start and end)
Start probs: [0.01, 0.02, 0.7, 0.1, ...]; End probs: [0.01, 0.02, 0.1, 0.6, ...]
5Answer Extraction
1000 samples x 2 probability distributionsSelect token span with highest combined start and end probabilities1000 samples x 1 text span (answer)
Answer: 'Paris, France'
Training Trace - Epoch by Epoch

Loss: 1.2 |****
      0.8 |******
      0.5 |*********
      0.35|***********
      0.3 |************
       Epochs -> 1 2 3 4 5
EpochLoss ↓Accuracy ↑Observation
11.20.45Model starts learning, loss is high, accuracy low
20.80.60Loss decreases, accuracy improves as model learns to locate answers
30.50.75Model shows good understanding, loss continues to drop
40.350.82Training converges, accuracy stabilizes near 82%
50.300.85Final epoch, model achieves strong performance
Prediction Trace - 5 Layers
Layer 1: Tokenization
Layer 2: Input Encoding
Layer 3: Model Forward Pass
Layer 4: Answer Extraction
Layer 5: Detokenization
Model Quiz - 3 Questions
Test your understanding
What does the model predict to find the answer in the passage?
AOnly the first word of the passage
BStart and end positions of the answer span
CThe entire passage as the answer
DA summary of the passage
Key Insight
Extractive QA models learn to locate exact answer spans by predicting start and end positions in the passage. As training progresses, the model improves by reducing loss and increasing accuracy, enabling precise answer extraction from text.