Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to create a multi-head attention layer with 8 heads.

PyTorch

import torch.nn as nn

multihead_attn = nn.MultiheadAttention(embed_dim=64, num_heads=[1])

Drag options to blanks, or click blank then click option'

C16

D32

Attempts:

3 left

2fill in blank

medium

Complete the code to apply multi-head attention on query, key, and value tensors.

PyTorch

output, attn_weights = multihead_attn([1], key, value)

Drag options to blanks, or click blank then click option'

Aquery

Bkey

Cvalue

Doutput

Attempts:

3 left

3fill in blank

hard

Fix the error in the code to correctly reshape the output of multi-head attention.

PyTorch

batch_size, seq_len, embed_dim = x.size()
output = output.transpose(0, 1).reshape([1], seq_len, embed_dim)

Drag options to blanks, or click blank then click option'

Aseq_len

Boutput.size(0)

Cembed_dim

Dbatch_size

Attempts:

3 left

4fill in blank

hard

Fill both blanks to create a mask that prevents attention to future tokens.

PyTorch

import torch
seq_len = 5
mask = torch.triu(torch.ones(seq_len, seq_len), diagonal=[1]) == [2]

Drag options to blanks, or click blank then click option'

CTrue

DFalse

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to compute scaled dot-product attention manually.

PyTorch

import torch
import torch.nn.functional as F
import math

d_k = query.size(-1)
scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt([1])
attn = F.softmax(scores, dim=[2])
output = torch.matmul(attn, [3])

Drag options to blanks, or click blank then click option'

Ad_k

B-1

Cvalue

Dquery

Attempts:

3 left