0
0
NLPml~20 mins

Why transformers revolutionized NLP - Challenge Your Understanding

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Transformer Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Key innovation of transformers in NLP
Which feature of transformers most directly allows them to handle long-range dependencies in text better than previous models?
AConvolutional layers that capture local patterns in fixed windows
BUse of recurrent connections to process sequences step-by-step
CPredefined fixed-length context windows for input sequences
DSelf-attention mechanism that weighs all words in a sentence simultaneously
Attempts:
2 left
💡 Hint
Think about how the model can look at all words at once instead of one by one.
Model Choice
intermediate
2:00remaining
Choosing a model architecture for NLP tasks
You want to build a model that understands context in long documents for summarization. Which model architecture is best suited?
ARecurrent Neural Network (RNN) with LSTM cells
BTransformer with self-attention layers
CConvolutional Neural Network (CNN) with small kernels
DSimple feedforward neural network
Attempts:
2 left
💡 Hint
Consider which model can capture relationships across long text spans.
Metrics
advanced
2:00remaining
Evaluating transformer model performance
After training a transformer for language translation, which metric best measures how well the model's output matches human translations?
ABLEU score comparing generated and reference sentences
BAccuracy of predicted next word
CMean squared error between word embeddings
DConfusion matrix of predicted classes
Attempts:
2 left
💡 Hint
Think about a metric designed for comparing sentences in translation tasks.
🔧 Debug
advanced
2:00remaining
Identifying a common transformer training issue
You trained a transformer model but notice the training loss does not decrease and stays very high. Which issue is most likely causing this?
AUsing batch size too large causing slow convergence
BNot using dropout layers causing overfitting
CUsing a learning rate that is too high causing unstable updates
DApplying layer normalization after the output layer
Attempts:
2 left
💡 Hint
Consider what happens if the model weights update too aggressively.
Predict Output
expert
3:00remaining
Output shape of transformer attention scores
Given the following PyTorch code snippet for a transformer attention layer, what is the shape of the 'attention_scores' tensor?
NLP
import torch
batch_size = 2
seq_len = 5
embed_dim = 16
num_heads = 4

# Query, Key tensors
Q = torch.rand(batch_size, num_heads, seq_len, embed_dim // num_heads)
K = torch.rand(batch_size, num_heads, seq_len, embed_dim // num_heads)

# Compute attention scores
attention_scores = torch.matmul(Q, K.transpose(-2, -1))
Atorch.Size([2, 4, 5, 5])
Btorch.Size([2, 4, 16, 16])
Ctorch.Size([2, 16, 5, 5])
Dtorch.Size([2, 5, 4, 4])
Attempts:
2 left
💡 Hint
Recall that attention scores are computed by multiplying queries and keys along the embedding dimension.