import torch attention_weights = torch.tensor([[0.1, 0.7, 0.2]]) values = torch.tensor([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]) weighted_sum = torch.matmul([1], values) print(weighted_sum)

import torch import torch.nn.functional as F query = torch.randn(1, 64) key = torch.randn(10, 64) scores = torch.matmul(query, key.T) / [1] attention_weights = F.[2](scores, dim=-1) print(attention_weights)

import torch import torch.nn as nn class MultiHeadAttention(nn.Module): def __init__(self, embed_dim, num_heads): super().__init__() self.num_heads = num_heads self.head_dim = embed_dim // num_heads self.linear_out = nn.Linear(embed_dim, embed_dim) def forward(self, x): batch_size, seq_len, embed_dim = x.size() # Assume x is already split into heads and attention applied concat_heads = x.reshape(batch_size, seq_len, [1]) output = self.linear_out([2]) return output attention = MultiHeadAttention(embed_dim=128, num_heads=8) x = torch.randn(2, 10, 128) result = attention(x) print(result.shape)

Practice

(1/5)

1. What is the main purpose of the attention mechanism in NLP models?

easy

A. To increase the size of the input data

B. To reduce the number of layers in the model

C. To help the model focus on important parts of the input data

D. To randomly shuffle the input tokens

Attention mechanism in depth in NLP - Interactive Code Practice

Start learning this pattern below

Practice

Solution

Step 1: Understand attention's role

Step 2: Compare options

Final Answer:

Quick Check:

Solution

Step 1: Recall attention weight calculation

Step 2: Evaluate options

Final Answer:

Quick Check:

Solution

Step 1: Calculate dot products Q x K^T

Step 2: Apply softmax to scores

Step 3: Compute weighted sum of values

Step 4: Match option

Final Answer:

Quick Check:

Solution

Step 1: Check dot product operation

Step 2: Analyze code

Final Answer:

Quick Check:

Solution

Step 1: Understand dot product scaling

Step 2: Role of scaling by sqrt of key dimension

Final Answer:

Quick Check: