Complete the code to create a Transformer encoder layer using PyTorch.
import torch.nn as nn encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=[1])
The number of attention heads (nhead) is commonly set to 8 in Transformer models for balanced performance.
Complete the code to apply positional encoding to the input embeddings.
pos_encoded = embeddings + [1]Positional encoding is added to embeddings to give the model information about token positions.
Fix the error in the multi-head attention call by filling the correct argument.
output, weights = multihead_attn(query, key, [1])The third argument to multihead_attn is the value tensor, which is used along with query and key.
Fill both blanks to complete the Transformer decoder layer initialization.
decoder_layer = nn.TransformerDecoderLayer(d_model=[1], nhead=[2])
The decoder layer uses d_model=512 and nhead=8 to match common Transformer settings.
Fill all three blanks to create a dictionary comprehension that maps each token to its embedding size if the size is greater than 300.
embedding_sizes = {token: [1] for token, size in token_sizes.items() if size [2] 300 and size == [3]This comprehension selects tokens with embedding size greater than 300 and exactly 512, mapping token to size.