Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to create a TransformerDecoderLayer with 8 attention heads.
PyTorch
import torch.nn as nn decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=[1])
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using a number of heads that does not divide d_model evenly.
Confusing nhead with number of layers.
✗ Incorrect
The TransformerDecoderLayer requires the number of attention heads as nhead. 8 is a common choice for d_model=512.
2fill in blank
mediumComplete the code to pass the memory tensor to the TransformerDecoder.
PyTorch
import torch import torch.nn as nn memory = torch.rand(10, 32, 512) # (sequence_length, batch_size, d_model) decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8) decoder = nn.TransformerDecoder(decoder_layer, num_layers=6) output = decoder(tgt=torch.rand(20, 32, 512), memory=[1])
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the target tensor instead of memory.
Passing an undefined variable.
✗ Incorrect
The memory argument is the output from the encoder and must be passed to the decoder as the memory parameter.
3fill in blank
hardFix the error in the code by selecting the correct mask to prevent the decoder from attending to future tokens.
PyTorch
import torch import torch.nn as nn decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8) decoder = nn.TransformerDecoder(decoder_layer, num_layers=6) tgt = torch.rand(20, 32, 512) memory = torch.rand(10, 32, 512) size = tgt.size(0) mask = torch.triu(torch.ones(size, size), diagonal=[1]).bool() output = decoder(tgt, memory, tgt_mask=mask)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using diagonal=0 which masks the current token as well.
Using negative diagonal which is invalid here.
✗ Incorrect
The mask should block attention to future tokens by masking the upper triangle above the main diagonal, so diagonal=1 is correct.
4fill in blank
hardFill both blanks to create a TransformerDecoderLayer with dropout and ReLU activation.
PyTorch
import torch.nn as nn decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8, dropout=[1], activation=[2])
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using dropout as an integer.
Passing activation without quotes.
✗ Incorrect
A dropout of 0.1 is common, and ReLU activation is specified as the string "relu".
5fill in blank
hardFill all three blanks to create a TransformerDecoder, pass the target and memory, and apply the correct mask.
PyTorch
import torch import torch.nn as nn memory = torch.rand(15, 64, 256) tgt = torch.rand(30, 64, 256) decoder_layer = nn.TransformerDecoderLayer(d_model=256, nhead=4) decoder = nn.TransformerDecoder(decoder_layer, num_layers=[1]) size = tgt.size(0) mask = torch.triu(torch.ones(size, size), diagonal=[2]).bool() output = decoder(tgt=[3], memory=memory, tgt_mask=mask)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using the wrong number of layers.
Setting mask diagonal to 0 or 2.
Passing memory as tgt.
✗ Incorrect
The decoder has 5 layers, the mask diagonal is 1 to block future tokens, and tgt is passed as the target tensor.