Challenge - 5 Problems
Transformer Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate2:00remaining
Key innovation of transformers in NLP
Which feature of transformers most directly allows them to handle long-range dependencies in text better than previous models?
Attempts:
2 left
💡 Hint
Think about how the model can look at all words at once instead of one by one.
✗ Incorrect
Transformers use self-attention to consider all words in a sentence at the same time, enabling them to capture relationships between distant words effectively.
❓ Model Choice
intermediate2:00remaining
Choosing a model architecture for NLP tasks
You want to build a model that understands context in long documents for summarization. Which model architecture is best suited?
Attempts:
2 left
💡 Hint
Consider which model can capture relationships across long text spans.
✗ Incorrect
Transformers excel at capturing context over long sequences due to their self-attention mechanism, unlike RNNs or CNNs which have limitations with long-range dependencies.
❓ Metrics
advanced2:00remaining
Evaluating transformer model performance
After training a transformer for language translation, which metric best measures how well the model's output matches human translations?
Attempts:
2 left
💡 Hint
Think about a metric designed for comparing sentences in translation tasks.
✗ Incorrect
BLEU score measures the overlap between machine-generated and human reference translations, making it suitable for evaluating translation quality.
🔧 Debug
advanced2:00remaining
Identifying a common transformer training issue
You trained a transformer model but notice the training loss does not decrease and stays very high. Which issue is most likely causing this?
Attempts:
2 left
💡 Hint
Consider what happens if the model weights update too aggressively.
✗ Incorrect
A learning rate that is too high can cause the model to overshoot minima, preventing loss from decreasing.
❓ Predict Output
expert3:00remaining
Output shape of transformer attention scores
Given the following PyTorch code snippet for a transformer attention layer, what is the shape of the 'attention_scores' tensor?
NLP
import torch batch_size = 2 seq_len = 5 embed_dim = 16 num_heads = 4 # Query, Key tensors Q = torch.rand(batch_size, num_heads, seq_len, embed_dim // num_heads) K = torch.rand(batch_size, num_heads, seq_len, embed_dim // num_heads) # Compute attention scores attention_scores = torch.matmul(Q, K.transpose(-2, -1))
Attempts:
2 left
💡 Hint
Recall that attention scores are computed by multiplying queries and keys along the embedding dimension.
✗ Incorrect
Q has shape (batch_size, num_heads, seq_len, head_dim). Multiplying Q by K^T along the last two dims results in (batch_size, num_heads, seq_len, seq_len).