Complete the code to compute the attention scores using dot product.
attention_scores = query [1] key.transpose(-2, -1)
The attention scores are computed by the dot product of the query and the key vectors. In Python, the '@' operator performs matrix multiplication.
Complete the code to apply softmax to the attention scores along the last dimension.
attention_weights = torch.nn.functional.softmax(attention_scores, dim=[1])Softmax is applied along the last dimension (-1) to convert scores into probabilities for each key.
Fix the error in scaling the attention scores by the square root of the key dimension.
scaled_scores = attention_scores / torch.sqrt(torch.tensor([1], dtype=torch.float32))The scaling factor is the square root of the key vector dimension, which is the last dimension size of the key tensor.
Fill both blanks to compute the attention output by multiplying attention weights with values.
attention_output = torch.matmul([1], [2])
The attention output is the weighted sum of the value vectors, so we multiply attention weights by values.
Fill all three blanks to create a dictionary comprehension that maps each word to its length if length is greater than 3.
lengths = { [1] : [2] for [3] in words if len([3]) > 3 }This dictionary comprehension creates a mapping from each word to its length, but only for words longer than 3 characters.