0
0
PyTorchml~10 mins

Self-attention mechanism in PyTorch - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create a linear layer for query vectors in self-attention.

PyTorch
import torch.nn as nn

class SelfAttention(nn.Module):
    def __init__(self, embed_size):
        super(SelfAttention, self).__init__()
        self.query = nn.Linear(embed_size, [1])
Drag options to blanks, or click blank then click option'
A1
B2 * embed_size
Cembed_size // 2
Dembed_size
Attempts:
3 left
💡 Hint
Common Mistakes
Using a different output size than embed_size causes shape mismatch errors.
Setting output size to 1 loses information needed for attention.
2fill in blank
medium

Complete the code to compute the attention scores using matrix multiplication.

PyTorch
def forward(self, values, keys, queries):
    scores = torch.matmul(queries, [1].transpose(-2, -1))
    return scores
Drag options to blanks, or click blank then click option'
Avalues
Bkeys
Cscores
Dqueries
Attempts:
3 left
💡 Hint
Common Mistakes
Multiplying queries with values instead of keys.
Not transposing keys before multiplication.
3fill in blank
hard

Fix the error in scaling the attention scores by the square root of the key dimension.

PyTorch
import torch
import math

def forward(self, values, keys, queries):
    d_k = keys.size(-1)
    scores = torch.matmul(queries, keys.transpose(-2, -1))
    scaled_scores = scores / math.sqrt([1])
    return scaled_scores
Drag options to blanks, or click blank then click option'
Ad_k
Bqueries.size(-1)
Cvalues.size(-1)
Dkeys.size(0)
Attempts:
3 left
💡 Hint
Common Mistakes
Using queries size instead of keys size for scaling.
Using batch size or wrong dimension for scaling.
4fill in blank
hard

Fill both blanks to apply softmax on scaled scores and multiply by values to get attention output.

PyTorch
import torch.nn.functional as F

def forward(self, values, keys, queries):
    d_k = keys.size(-1)
    scores = torch.matmul(queries, keys.transpose(-2, -1)) / math.sqrt(d_k)
    attention = F.[1](scores, dim=[2])
    output = torch.matmul(attention, values)
    return output
Drag options to blanks, or click blank then click option'
Asoftmax
B-1
Crelu
Dtanh
Attempts:
3 left
💡 Hint
Common Mistakes
Using sigmoid or relu instead of softmax for attention weights.
Applying softmax on wrong dimension.
5fill in blank
hard

Fill all three blanks to complete a self-attention forward pass with query, key, value projections and output.

PyTorch
def forward(self, x):
    queries = self.query(x)
    keys = self.key(x)
    values = self.value(x)
    d_k = keys.size(-1)
    scores = torch.matmul(queries, keys.transpose(-2, -1)) / math.sqrt([1])
    attention = torch.nn.functional.[2](scores, dim=[3])
    out = torch.matmul(attention, values)
    return out
Drag options to blanks, or click blank then click option'
Ad_k
Bsoftmax
C-1
Dembed_size
Attempts:
3 left
💡 Hint
Common Mistakes
Using embed_size instead of d_k for scaling.
Using wrong activation instead of softmax.
Applying softmax on wrong dimension.