0
0
NLPml~10 mins

Self-attention and multi-head attention in NLP - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to compute the attention scores by multiplying queries and keys.

NLP
attention_scores = torch.matmul(queries, [1].transpose(-2, -1))
Drag options to blanks, or click blank then click option'
Aqueries
Bkeys
Cvalues
Dweights
Attempts:
3 left
💡 Hint
Common Mistakes
Using values instead of keys for multiplication.
Not transposing the keys before multiplication.
2fill in blank
medium

Complete the code to scale the attention scores by the square root of the key dimension.

NLP
scaled_scores = attention_scores / math.sqrt([1])
Drag options to blanks, or click blank then click option'
Avalue_dim
Bbatch_size
Ckey_dim
Dquery_dim
Attempts:
3 left
💡 Hint
Common Mistakes
Scaling by query dimension instead of key dimension.
Forgetting to scale attention scores.
3fill in blank
hard

Fix the error in applying softmax to the attention scores along the correct dimension.

NLP
attention_weights = torch.nn.functional.softmax(attention_scores, dim=[1])
Drag options to blanks, or click blank then click option'
A-1
B1
C0
D-2
Attempts:
3 left
💡 Hint
Common Mistakes
Applying softmax along batch or query dimension.
Using dim=1 which is usually the sequence length dimension but not keys.
4fill in blank
hard

Fill both blanks to compute multi-head attention output by concatenating heads and applying a linear layer.

NLP
multihead_output = self.linear_out(torch.cat([1], dim=[2]))
Drag options to blanks, or click blank then click option'
Aattended_heads
B1
C-1
Dheads
Attempts:
3 left
💡 Hint
Common Mistakes
Concatenating along the batch dimension.
Using wrong variable name for attended heads.
5fill in blank
hard

Fill all three blanks to implement scaled dot-product attention: compute scores, apply softmax, and multiply by values.

NLP
scores = torch.matmul(queries, [1].transpose(-2, -1)) / math.sqrt([2])
weights = torch.nn.functional.softmax(scores, dim=[3])
output = torch.matmul(weights, values)
Drag options to blanks, or click blank then click option'
Akeys
Bkey_dim
C-1
Dqueries
Attempts:
3 left
💡 Hint
Common Mistakes
Using values instead of keys for scores.
Applying softmax along wrong dimension.
Forgetting to scale scores.