Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to create a linear layer for query vectors in self-attention.

PyTorch

import torch.nn as nn

class SelfAttention(nn.Module):
    def __init__(self, embed_size):
        super(SelfAttention, self).__init__()
        self.query = nn.Linear(embed_size, [1])

Drag options to blanks, or click blank then click option'

B2 * embed_size

Cembed_size // 2

Dembed_size

Attempts:

3 left

2fill in blank

medium

Complete the code to compute the attention scores using matrix multiplication.

PyTorch

def forward(self, values, keys, queries):
    scores = torch.matmul(queries, [1].transpose(-2, -1))
    return scores

Drag options to blanks, or click blank then click option'

Avalues

Bkeys

Cscores

Dqueries

Attempts:

3 left

3fill in blank

hard

Fix the error in scaling the attention scores by the square root of the key dimension.

PyTorch

import torch
import math

def forward(self, values, keys, queries):
    d_k = keys.size(-1)
    scores = torch.matmul(queries, keys.transpose(-2, -1))
    scaled_scores = scores / math.sqrt([1])
    return scaled_scores

Drag options to blanks, or click blank then click option'

Ad_k

Bqueries.size(-1)

Cvalues.size(-1)

Dkeys.size(0)

Attempts:

3 left

4fill in blank

hard

Fill both blanks to apply softmax on scaled scores and multiply by values to get attention output.

PyTorch

import torch.nn.functional as F

def forward(self, values, keys, queries):
    d_k = keys.size(-1)
    scores = torch.matmul(queries, keys.transpose(-2, -1)) / math.sqrt(d_k)
    attention = F.[1](scores, dim=[2])
    output = torch.matmul(attention, values)
    return output

Drag options to blanks, or click blank then click option'

Asoftmax

B-1

Crelu

Dtanh

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to complete a self-attention forward pass with query, key, value projections and output.

PyTorch

def forward(self, x):
    queries = self.query(x)
    keys = self.key(x)
    values = self.value(x)
    d_k = keys.size(-1)
    scores = torch.matmul(queries, keys.transpose(-2, -1)) / math.sqrt([1])
    attention = torch.nn.functional.[2](scores, dim=[3])
    out = torch.matmul(attention, values)
    return out

Drag options to blanks, or click blank then click option'

Ad_k

Bsoftmax

C-1

Dembed_size

Attempts:

3 left