Complete the code to define the encoder embedding layer.
self.embedding = nn.[1](input_dim, embed_dim)The encoder uses an embedding layer to convert input tokens into vectors.
Complete the code to compute attention weights using softmax.
attn_weights = F.[1](scores, dim=1)
Softmax converts scores into probabilities that sum to 1, which are attention weights.
Fix the error in the decoder's context vector calculation.
context = torch.bmm(attn_weights.unsqueeze(1), encoder_outputs).[1](1)
After batch matrix multiplication, squeeze removes the extra dimension to get the context vector.
Fill both blanks to complete the attention score calculation using dot product.
scores = torch.bmm(encoder_outputs, decoder_hidden.[1](2, 1, 0)).[2](2, 1, 0)
Permute rearranges dimensions to align for batch matrix multiplication in dot product attention.
Fill all three blanks to complete the decoder forward pass with attention.
embedded = self.embedding(input_step) attn_weights = F.softmax(torch.bmm(encoder_outputs, decoder_hidden.[1](2, 1, 0)), dim=1) context = torch.bmm(attn_weights.unsqueeze(1), encoder_outputs).[2](1) rnn_input = torch.cat((embedded, context), dim=[3])
Permute rearranges decoder hidden state for multiplication, squeeze removes extra dimension from context, and concatenation happens on dimension 2 (features).