Complete the code to create an embedding layer for the input sequence.
embedding_layer = nn.Embedding(num_embeddings=[1], embedding_dim=256)
The embedding layer requires the vocabulary size as the number of embeddings.
Complete the code to initialize the encoder LSTM with the correct input size.
encoder_lstm = nn.LSTM(input_size=[1], hidden_size=512, batch_first=True)
The LSTM input size should match the embedding dimension, not the vocabulary size.
Fix the error in the decoder forward pass by selecting the correct input to the decoder LSTM.
decoder_output, (hidden, cell) = decoder_lstm([1], (hidden, cell))The decoder LSTM takes the previous decoder input (usually the embedded previous token) as input.
Fill both blanks to complete the attention score calculation using dot product.
attention_scores = torch.bmm(encoder_outputs, [1].unsqueeze(2)).squeeze(2) attention_weights = torch.softmax(attention_scores, dim=[2])
The attention scores are computed by dot product of encoder outputs and decoder hidden state. Softmax is applied over the sequence length dimension (dim=1).
Fill all three blanks to complete the decoder output calculation with attention context.
context = torch.bmm(attention_weights.unsqueeze(1), [1]) combined = torch.cat((context.squeeze(1), [2]), dim=[3]) output = decoder_fc(combined)
The context vector is computed from encoder outputs weighted by attention. It is concatenated with the decoder output along the feature dimension (dim=1) before passing to the final layer.