Practice

(1/5)

1. Why did transformers change the way machines understand language in NLP?

easy

A. Because they use simple rules without learning

B. Because they consider the whole sentence context at once

C. Because they only look at one word at a time

D. Because they ignore word order completely

Solution

Step 1: Understand traditional NLP limits
Older models processed words one by one or in small groups, missing full sentence meaning.
Step 2: Recognize transformer's key feature
Transformers look at all words together, capturing context better.
Final Answer:
Because they consider the whole sentence context at once -> Option B
Quick Check:
Context awareness = C [OK]

Hint: Transformers see all words together, not one by one [OK]

Common Mistakes:

Thinking transformers process words one at a time
Believing transformers ignore word order
Confusing transformers with rule-based systems

2. Which of the following is the correct way to describe the transformer's attention mechanism?

easy

A. It randomly selects words to ignore

B. It translates words without looking at context

C. It focuses on important words by assigning weights to them

D. It removes all punctuation before processing

Solution

Step 1: Recall attention purpose
Attention helps the model decide which words matter more in a sentence.
Step 2: Match description to attention
Assigning weights to words matches how attention works.
Final Answer:
It focuses on important words by assigning weights to them -> Option C
Quick Check:
Attention = weighted focus [OK]

Hint: Attention means weighting important words higher [OK]

Common Mistakes:

Thinking attention ignores words randomly
Believing attention removes punctuation
Confusing attention with translation

3. Given this simplified transformer attention code snippet, what will be the output shape if input has shape (batch_size=2, seq_len=3, embed_dim=4)?

import torch
from torch.nn import MultiheadAttention

input_tensor = torch.rand(3, 2, 4)  # seq_len, batch_size, embed_dim
attention = MultiheadAttention(embed_dim=4, num_heads=2)
output, _ = attention(input_tensor, input_tensor, input_tensor)
print(output.shape)

medium

A. torch.Size([3, 2, 4])

B. torch.Size([2, 3, 4])

C. torch.Size([3, 4, 2])

D. torch.Size([2, 4, 3])

Solution

Step 1: Understand input shape format
Input shape is (seq_len=3, batch_size=2, embed_dim=4) as required by PyTorch MultiheadAttention.
Step 2: Check output shape from attention
Output shape matches input shape: (seq_len, batch_size, embed_dim) = (3, 2, 4).
Final Answer:
torch.Size([3, 2, 4]) -> Option A
Quick Check:
Output shape = input shape [OK]

Hint: Output shape matches input shape in PyTorch attention [OK]

Common Mistakes:

Mixing batch and sequence dimensions
Assuming output shape changes embed dimension
Confusing PyTorch input format with batch-first

4. This code tries to create a transformer model but throws an error. What is the mistake?

from transformers import BertModel

model = BertModel()
output = model("Hello world")

medium

A. The string input should be a list, not a string

B. BertModel cannot be imported from transformers

C. The model must be trained before use

D. BertModel requires tokenized input, not raw text

Solution

Step 1: Check input type for BertModel
BertModel expects token IDs (numbers), not raw text strings.
Step 2: Identify correct input preparation
Text must be tokenized using a tokenizer before passing to the model.
Final Answer:
BertModel requires tokenized input, not raw text -> Option D
Quick Check:
Tokenize text before model input [OK]

Hint: Always tokenize text before feeding to transformer models [OK]

Common Mistakes:

Passing raw strings directly to model
Assuming model auto-tokenizes input
Ignoring need for attention masks

5. You want to build a chatbot using transformers that can understand long conversations. Which feature of transformers helps handle long context better than older models?

hard

A. Self-attention mechanism that relates all words in the input

B. Using fixed-size windows to read text piece by piece

C. Ignoring previous sentences to focus on current input

D. Replacing words with fixed dictionaries without learning

Solution

Step 1: Understand chatbot context needs
Chatbots must remember and relate words across long conversations.
Step 2: Identify transformer feature for long context
Self-attention lets the model connect all words, even far apart, in one pass.
Final Answer:
Self-attention mechanism that relates all words in the input -> Option A
Quick Check:
Self-attention = long context handling [OK]

Hint: Self-attention links all words for long context [OK]

Common Mistakes:

Thinking transformers read text in small fixed windows
Believing transformers ignore previous sentences
Confusing dictionary lookup with learning

Epoch	Loss ↓	Accuracy ↑	Observation
1	2.3	0.25	Model starts learning basic word patterns
2	1.8	0.40	Attention helps model focus on important words
3	1.3	0.55	Model better understands sentence structure
4	0.9	0.70	Contextual word meaning improves predictions
5	0.6	0.80	Model captures complex language patterns

Why transformers revolutionized NLP - Model Pipeline Impact

Start learning this pattern below

Practice

Solution

Step 1: Understand traditional NLP limits

Step 2: Recognize transformer's key feature

Final Answer:

Quick Check:

Solution

Step 1: Recall attention purpose

Step 2: Match description to attention

Final Answer:

Quick Check:

Solution

Step 1: Understand input shape format

Step 2: Check output shape from attention

Final Answer:

Quick Check:

Solution

Step 1: Check input type for BertModel

Step 2: Identify correct input preparation

Final Answer:

Quick Check:

Solution

Step 1: Understand chatbot context needs

Step 2: Identify transformer feature for long context

Final Answer:

Quick Check: