Attention helps a model focus on important parts of input when making decisions. It works like how we pay attention to key words in a sentence to understand its meaning.
0
0
Attention mechanism basics in NLP
Introduction
Translating a sentence from one language to another
Answering questions based on a paragraph of text
Summarizing a long article into a short summary
Recognizing objects in an image by focusing on important areas
Syntax
NLP
attention_scores = query @ key.T / sqrt(d_k) attention_weights = softmax(attention_scores) output = attention_weights @ value
query, key, and value are vectors or matrices representing parts of the input.
The division by sqrt(d_k) helps keep the scores stable.
Examples
This example shows how to calculate attention weights and output using simple vectors.
NLP
import torch import torch.nn.functional as F query = torch.tensor([[1., 0., 1.]]) key = torch.tensor([[1., 0., 0.], [0., 1., 0.]]) value = torch.tensor([[1., 2.], [3., 4.]]) scores = query @ key.T / (3 ** 0.5) weights = F.softmax(scores, dim=1) output = weights @ value print(output)
This example uses numpy to do the same attention calculation without PyTorch.
NLP
import numpy as np def softmax(x): e_x = np.exp(x - np.max(x)) return e_x / e_x.sum(axis=-1, keepdims=True) query = np.array([1, 0, 1]) key = np.array([[1, 0, 0], [0, 1, 0]]) value = np.array([[1, 2], [3, 4]]) scores = query @ key.T / np.sqrt(3) weights = softmax(scores) output = weights @ value print(output)
Sample Model
This program shows a simple attention mechanism calculation step-by-step using PyTorch. It prints the scores, weights, and final output vector.
NLP
import torch import torch.nn.functional as F # Define query, key, value tensors query = torch.tensor([[1., 0., 1.]]) # shape (1, 3) key = torch.tensor([[1., 0., 0.], [0., 1., 0.]]) # shape (2, 3) value = torch.tensor([[1., 2.], [3., 4.]]) # shape (2, 2) d_k = query.size(-1) # dimension of key vectors # Calculate attention scores scores = query @ key.T / (d_k ** 0.5) # shape (1, 2) # Apply softmax to get attention weights weights = F.softmax(scores, dim=1) # shape (1, 2) # Multiply weights by values to get output output = weights @ value # shape (1, 2) print(f"Attention scores: {scores}") print(f"Attention weights: {weights}") print(f"Output: {output}")
OutputSuccess
Important Notes
Attention helps models decide what to focus on, improving understanding.
Softmax turns scores into probabilities that add up to 1.
Query, key, and value come from the input data or previous layers.
Summary
Attention finds important parts of input to focus on.
It uses query, key, and value vectors to calculate weighted outputs.
Softmax makes scores into weights that sum to one.