0
0
NLPml~10 mins

Attention mechanism in depth in NLP - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to compute the attention scores using dot product.

NLP
import torch

query = torch.randn(1, 5)
key = torch.randn(1, 5)

attention_scores = torch.matmul(query, [1].T)
print(attention_scores)
Drag options to blanks, or click blank then click option'
Avalue
Bkey
Cweights
Dquery
Attempts:
3 left
💡 Hint
Common Mistakes
Using query instead of key for the dot product.
Not transposing the key matrix before multiplication.
2fill in blank
medium

Complete the code to apply softmax to the attention scores to get attention weights.

NLP
import torch.nn.functional as F

attention_scores = torch.tensor([[1.0, 2.0, 3.0]])
attention_weights = F.[1](attention_scores, dim=-1)
print(attention_weights)
Drag options to blanks, or click blank then click option'
Arelu
Bsigmoid
Csoftmax
Dtanh
Attempts:
3 left
💡 Hint
Common Mistakes
Using sigmoid instead of softmax, which does not normalize across the dimension.
Applying activation functions like relu or tanh which do not produce probabilities.
3fill in blank
hard

Fix the error in the code to compute the weighted sum of values using attention weights.

NLP
import torch

attention_weights = torch.tensor([[0.1, 0.7, 0.2]])
values = torch.tensor([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]])
weighted_sum = torch.matmul([1], values)
print(weighted_sum)
Drag options to blanks, or click blank then click option'
Avalues
Bvalues.T
Cattention_weights.T
Dattention_weights
Attempts:
3 left
💡 Hint
Common Mistakes
Transposing attention weights causing shape mismatch.
Multiplying values by values instead of attention weights.
4fill in blank
hard

Fill both blanks to scale the attention scores and apply softmax.

NLP
import torch
import torch.nn.functional as F

query = torch.randn(1, 64)
key = torch.randn(10, 64)

scores = torch.matmul(query, key.T) / [1]
attention_weights = F.[2](scores, dim=-1)
print(attention_weights)
Drag options to blanks, or click blank then click option'
A64**0.5
B64
Csoftmax
Dsigmoid
Attempts:
3 left
💡 Hint
Common Mistakes
Using sigmoid instead of softmax for attention weights.
Not scaling scores causing unstable training.
5fill in blank
hard

Fill all three blanks to implement multi-head attention output concatenation and projection.

NLP
import torch
import torch.nn as nn

class MultiHeadAttention(nn.Module):
    def __init__(self, embed_dim, num_heads):
        super().__init__()
        self.num_heads = num_heads
        self.head_dim = embed_dim // num_heads
        self.linear_out = nn.Linear(embed_dim, embed_dim)

    def forward(self, x):
        batch_size, seq_len, embed_dim = x.size()
        # Assume x is already split into heads and attention applied
        concat_heads = x.reshape(batch_size, seq_len, [1])
        output = self.linear_out([2])
        return output

attention = MultiHeadAttention(embed_dim=128, num_heads=8)
x = torch.randn(2, 10, 128)
result = attention(x)
print(result.shape)
Drag options to blanks, or click blank then click option'
Aembed_dim
Bconcat_heads
Dx
Attempts:
3 left
💡 Hint
Common Mistakes
Using wrong dimension in reshape causing size mismatch.
Passing wrong variable to linear_out layer.