Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to create a simple attention score using dot product.
PyTorch
import torch query = torch.tensor([[1, 0, 1]], dtype=torch.float32) key = torch.tensor([[0, 1, 0]], dtype=torch.float32) attention_score = torch.matmul(query, [1].T) print(attention_score)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using query instead of key for the dot product.
Not transposing the key tensor before multiplication.
✗ Incorrect
The attention score is calculated by the dot product of the query and the transpose of the key.
2fill in blank
mediumComplete the code to apply softmax to the attention scores.
PyTorch
import torch import torch.nn.functional as F scores = torch.tensor([[1.0, 2.0, 3.0]]) attention_weights = F.[1](scores, dim=1) print(attention_weights)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using sigmoid which does not normalize across the dimension.
Using relu or tanh which do not produce probability distributions.
✗ Incorrect
Softmax converts raw scores into probabilities that sum to 1, which is essential for attention weights.
3fill in blank
hardFix the error in the scaled dot-product attention calculation.
PyTorch
import torch import math query = torch.randn(1, 4) key = torch.randn(1, 4) scale = math.sqrt(query.shape[1]) scores = torch.matmul(query, key.T) / [1] print(scores)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using batch size dimension instead of feature dimension for scaling.
Using key shape dimension incorrectly.
✗ Incorrect
The scale factor is the square root of the key/query dimension, which is query.shape[1].
4fill in blank
hardFill both blanks to complete the attention output calculation.
PyTorch
import torch import torch.nn.functional as F query = torch.randn(1, 3) key = torch.randn(1, 3) value = torch.randn(1, 3) scores = torch.matmul(query, key.T) / torch.sqrt(torch.tensor(query.shape[[1]], dtype=torch.float32)) weights = F.softmax(scores, dim=[2]) output = torch.matmul(weights, value) print(output)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using wrong shape index for scaling.
Applying softmax on wrong dimension.
✗ Incorrect
The feature dimension is at index 1 for shape, and softmax is applied along dimension 0 for batch size.
5fill in blank
hardFill all three blanks to implement a simple attention mechanism output.
PyTorch
import torch import torch.nn.functional as F query = torch.randn(2, 4) key = torch.randn(2, 4) value = torch.randn(2, 4) scores = torch.matmul(query, key.T) / torch.sqrt(torch.tensor(query.shape[[1]], dtype=torch.float32)) weights = F.softmax(scores, dim=[2]) attention_output = torch.matmul(weights, [3]) print(attention_output)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using key instead of value for final multiplication.
Incorrect dimension indices for scaling or softmax.
✗ Incorrect
Scale by feature dimension (index 1), apply softmax along batch dimension (0), and multiply weights by value to get output.