Attention helps models focus on important parts of data, making learning smarter and faster.
0
0
Why attention revolutionized deep learning in PyTorch
Introduction
When translating languages to understand which words matter most.
When summarizing long texts by focusing on key sentences.
When recognizing objects in images by highlighting important areas.
When answering questions by focusing on relevant information.
When improving speech recognition by focusing on important sounds.
Syntax
PyTorch
import torch import torch.nn as nn class SimpleAttention(nn.Module): def __init__(self, input_dim): super().__init__() self.attention = nn.Linear(input_dim, 1) def forward(self, x): scores = self.attention(x) # shape: (batch, seq_len, 1) weights = torch.softmax(scores, dim=1) # attention weights weighted_sum = (weights * x).sum(dim=1) # weighted average return weighted_sum, weights
The attention layer learns to assign importance scores to each part of the input.
Softmax turns scores into probabilities that sum to 1, highlighting important parts.
Examples
This example shows how to get attention scores and weights for a batch of sequences.
PyTorch
attention = nn.Linear(10, 1) x = torch.randn(2, 5, 10) # batch=2, seq_len=5, features=10 scores = attention(x) weights = torch.softmax(scores, dim=1)
Weighted sum combines input features using attention weights to focus on important parts.
PyTorch
weighted_sum = (weights * x).sum(dim=1)
Sample Model
This program shows a simple attention mechanism focusing on the first feature of each input vector. We set weights manually to see how attention picks important parts.
PyTorch
import torch import torch.nn as nn class SimpleAttention(nn.Module): def __init__(self, input_dim): super().__init__() self.attention = nn.Linear(input_dim, 1) def forward(self, x): scores = self.attention(x) # (batch, seq_len, 1) weights = torch.softmax(scores, dim=1) weighted_sum = (weights * x).sum(dim=1) return weighted_sum, weights # Create dummy data: batch=1, seq_len=4, features=3 x = torch.tensor([[[1.0, 0.0, 0.0], [0.0, 2.0, 0.0], [0.0, 0.0, 3.0], [1.0, 1.0, 1.0]]]) model = SimpleAttention(input_dim=3) # Manually set weights to see clear attention with torch.no_grad(): model.attention.weight.copy_(torch.tensor([[1.0, 0.0, 0.0]])) model.attention.bias.fill_(0) weighted_sum, weights = model(x) print("Attention weights:", weights) print("Weighted sum:", weighted_sum)
OutputSuccess
Important Notes
Attention helps models look at the most useful parts instead of everything equally.
It works well with sequences like sentences or time series.
Attention made big improvements in language translation and many other tasks.
Summary
Attention lets models focus on important data parts.
This focus improves learning and results.
It changed how deep learning models work, especially for language and vision.