Challenge - 5 Problems
Transformer Decoder Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output shape of Transformer decoder layer
Given the following PyTorch code snippet for a Transformer decoder layer, what is the shape of the output tensor?
PyTorch
import torch import torch.nn as nn decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8) memory = torch.rand(10, 32, 512) # (sequence_length, batch_size, d_model) tgt = torch.rand(20, 32, 512) # (target_sequence_length, batch_size, d_model) output = decoder_layer(tgt, memory) print(output.shape)
Attempts:
2 left
💡 Hint
Remember the output shape matches the target input shape for the decoder layer.
✗ Incorrect
The Transformer decoder layer outputs a tensor with the same shape as the target input: (target_sequence_length, batch_size, d_model). Here, tgt has shape (20, 32, 512), so output shape is (20, 32, 512).
🧠 Conceptual
intermediate1:30remaining
Purpose of the memory input in Transformer decoder
In a Transformer decoder, what is the role of the 'memory' input?
Attempts:
2 left
💡 Hint
Think about how the decoder uses information from the encoder.
✗ Incorrect
The 'memory' input is the output of the encoder. The decoder attends to this memory to incorporate encoded source information when generating outputs.
❓ Hyperparameter
advanced1:30remaining
Effect of increasing nhead in Transformer decoder
What is the effect of increasing the 'nhead' parameter in a Transformer decoder layer?
Attempts:
2 left
💡 Hint
Recall what 'nhead' controls in multi-head attention.
✗ Incorrect
'nhead' specifies how many parallel attention heads the model uses. More heads allow the model to attend to different parts of the input simultaneously.
🔧 Debug
advanced2:00remaining
Identifying error in Transformer decoder usage
What error will this code raise when running the Transformer decoder layer?
PyTorch
import torch import torch.nn as nn decoder_layer = nn.TransformerDecoderLayer(d_model=256, nhead=4) memory = torch.rand(15, 16, 256) tgt = torch.rand(15, 16, 128) # Incorrect d_model size output = decoder_layer(tgt, memory)
Attempts:
2 left
💡 Hint
Check the last dimension sizes of tgt and memory tensors.
✗ Incorrect
The decoder layer expects tgt and memory to have the same feature dimension (d_model). Here, tgt has 128 features but memory has 256, causing a RuntimeError.
❓ Metrics
expert2:30remaining
Choosing the best metric for Transformer decoder language generation
Which metric is most appropriate to evaluate the quality of text generated by a Transformer decoder in a language translation task?
Attempts:
2 left
💡 Hint
Think about how translation quality is commonly measured.
✗ Incorrect
BLEU score compares the generated text to reference translations by measuring n-gram overlaps, which is standard for translation tasks.