For transformer models in NLP, perplexity and accuracy are key metrics. Perplexity measures how well the model predicts the next word, showing its understanding of language. Accuracy helps evaluate tasks like text classification. These metrics matter because transformers improved language understanding and generation, so better scores mean better language skills.
Why transformers revolutionized NLP - Why Metrics Matter
Start learning this pattern below
Jump into concepts and practice - no test required
For classification tasks using transformers, a confusion matrix shows how many examples were correctly or incorrectly labeled:
Actual \ Predicted | Positive | Negative
-------------------|----------|---------
Positive | TP=85 | FN=15
Negative | FP=10 | TN=90
This helps calculate precision and recall, showing the model's strengths and weaknesses.
Transformers can be tuned for different tasks. For example:
- High precision: In spam detection, transformers should avoid marking good emails as spam. So, precision is more important.
- High recall: In medical text analysis, transformers should catch all mentions of diseases. Missing any is bad, so recall is prioritized.
Understanding this tradeoff helps choose the right model settings for the task.
For transformer NLP models:
- Good: Perplexity close to 10 or lower on language modeling, accuracy above 90% on classification, precision and recall balanced above 85%.
- Bad: High perplexity (100+), accuracy below 70%, or very low recall (below 50%) meaning the model misses many important cases.
Good metrics mean the transformer understands and processes language well.
- Accuracy paradox: High accuracy can be misleading if data is unbalanced. For example, if 95% of texts are negative, a model always predicting negative gets 95% accuracy but is useless.
- Data leakage: If test data leaks into training, metrics look great but model fails in real use.
- Overfitting: Very low training loss but poor test metrics means the transformer memorized training data and won't generalize.
Your transformer model has 98% accuracy but only 12% recall on detecting spam emails. Is it good for production? Why or why not?
Answer: No, it is not good. The low recall means it misses most spam emails, so many spam messages get through. High accuracy is misleading because most emails are not spam, so the model just predicts "not spam" often. For spam detection, recall is very important to catch as many spam emails as possible.
Practice
Solution
Step 1: Understand traditional NLP limits
Older models processed words one by one or in small groups, missing full sentence meaning.Step 2: Recognize transformer's key feature
Transformers look at all words together, capturing context better.Final Answer:
Because they consider the whole sentence context at once -> Option BQuick Check:
Context awareness = C [OK]
- Thinking transformers process words one at a time
- Believing transformers ignore word order
- Confusing transformers with rule-based systems
Solution
Step 1: Recall attention purpose
Attention helps the model decide which words matter more in a sentence.Step 2: Match description to attention
Assigning weights to words matches how attention works.Final Answer:
It focuses on important words by assigning weights to them -> Option CQuick Check:
Attention = weighted focus [OK]
- Thinking attention ignores words randomly
- Believing attention removes punctuation
- Confusing attention with translation
import torch from torch.nn import MultiheadAttention input_tensor = torch.rand(3, 2, 4) # seq_len, batch_size, embed_dim attention = MultiheadAttention(embed_dim=4, num_heads=2) output, _ = attention(input_tensor, input_tensor, input_tensor) print(output.shape)
Solution
Step 1: Understand input shape format
Input shape is (seq_len=3, batch_size=2, embed_dim=4) as required by PyTorch MultiheadAttention.Step 2: Check output shape from attention
Output shape matches input shape: (seq_len, batch_size, embed_dim) = (3, 2, 4).Final Answer:
torch.Size([3, 2, 4]) -> Option AQuick Check:
Output shape = input shape [OK]
- Mixing batch and sequence dimensions
- Assuming output shape changes embed dimension
- Confusing PyTorch input format with batch-first
from transformers import BertModel
model = BertModel()
output = model("Hello world")Solution
Step 1: Check input type for BertModel
BertModel expects token IDs (numbers), not raw text strings.Step 2: Identify correct input preparation
Text must be tokenized using a tokenizer before passing to the model.Final Answer:
BertModel requires tokenized input, not raw text -> Option DQuick Check:
Tokenize text before model input [OK]
- Passing raw strings directly to model
- Assuming model auto-tokenizes input
- Ignoring need for attention masks
Solution
Step 1: Understand chatbot context needs
Chatbots must remember and relate words across long conversations.Step 2: Identify transformer feature for long context
Self-attention lets the model connect all words, even far apart, in one pass.Final Answer:
Self-attention mechanism that relates all words in the input -> Option AQuick Check:
Self-attention = long context handling [OK]
- Thinking transformers read text in small fixed windows
- Believing transformers ignore previous sentences
- Confusing dictionary lookup with learning
