0
0
PyTorchml~20 mins

Transformer decoder in PyTorch - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Transformer Decoder Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output shape of Transformer decoder layer
Given the following PyTorch code snippet for a Transformer decoder layer, what is the shape of the output tensor?
PyTorch
import torch
import torch.nn as nn

decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8)
memory = torch.rand(10, 32, 512)  # (sequence_length, batch_size, d_model)
tgt = torch.rand(20, 32, 512)     # (target_sequence_length, batch_size, d_model)
output = decoder_layer(tgt, memory)
print(output.shape)
Atorch.Size([20, 32, 512])
Btorch.Size([10, 32, 512])
Ctorch.Size([32, 20, 512])
Dtorch.Size([20, 512, 32])
Attempts:
2 left
💡 Hint
Remember the output shape matches the target input shape for the decoder layer.
🧠 Conceptual
intermediate
1:30remaining
Purpose of the memory input in Transformer decoder
In a Transformer decoder, what is the role of the 'memory' input?
AIt stores the previous decoder outputs for autoregressive generation.
BIt provides the encoded information from the encoder to guide decoding.
CIt contains the positional encodings for the target sequence.
DIt is used to compute the loss during training.
Attempts:
2 left
💡 Hint
Think about how the decoder uses information from the encoder.
Hyperparameter
advanced
1:30remaining
Effect of increasing nhead in Transformer decoder
What is the effect of increasing the 'nhead' parameter in a Transformer decoder layer?
AIt increases the batch size during training.
BIt increases the size of the feedforward network inside the decoder layer.
CIt increases the number of decoder layers stacked.
DIt increases the number of attention heads, allowing the model to focus on more representation subspaces.
Attempts:
2 left
💡 Hint
Recall what 'nhead' controls in multi-head attention.
🔧 Debug
advanced
2:00remaining
Identifying error in Transformer decoder usage
What error will this code raise when running the Transformer decoder layer?
PyTorch
import torch
import torch.nn as nn

decoder_layer = nn.TransformerDecoderLayer(d_model=256, nhead=4)
memory = torch.rand(15, 16, 256)
tgt = torch.rand(15, 16, 128)  # Incorrect d_model size
output = decoder_layer(tgt, memory)
ANo error, code runs successfully
BSyntaxError because of missing parentheses
CRuntimeError due to mismatched feature dimensions between tgt and memory
DTypeError because tgt is not a tensor
Attempts:
2 left
💡 Hint
Check the last dimension sizes of tgt and memory tensors.
Metrics
expert
2:30remaining
Choosing the best metric for Transformer decoder language generation
Which metric is most appropriate to evaluate the quality of text generated by a Transformer decoder in a language translation task?
ABLEU score measuring n-gram overlap with reference translations
BMean Squared Error between generated and reference token embeddings
CAccuracy of predicting the next token in the sequence
DF1 score computed on binary classification labels
Attempts:
2 left
💡 Hint
Think about how translation quality is commonly measured.