0
0
PyTorchml~5 mins

Why attention revolutionized deep learning in PyTorch

Choose your learning style9 modes available
Introduction

Attention helps models focus on important parts of data, making learning smarter and faster.

When translating languages to understand which words matter most.
When summarizing long texts by focusing on key sentences.
When recognizing objects in images by highlighting important areas.
When answering questions by focusing on relevant information.
When improving speech recognition by focusing on important sounds.
Syntax
PyTorch
import torch
import torch.nn as nn

class SimpleAttention(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.attention = nn.Linear(input_dim, 1)

    def forward(self, x):
        scores = self.attention(x)  # shape: (batch, seq_len, 1)
        weights = torch.softmax(scores, dim=1)  # attention weights
        weighted_sum = (weights * x).sum(dim=1)  # weighted average
        return weighted_sum, weights

The attention layer learns to assign importance scores to each part of the input.

Softmax turns scores into probabilities that sum to 1, highlighting important parts.

Examples
This example shows how to get attention scores and weights for a batch of sequences.
PyTorch
attention = nn.Linear(10, 1)
x = torch.randn(2, 5, 10)  # batch=2, seq_len=5, features=10
scores = attention(x)
weights = torch.softmax(scores, dim=1)
Weighted sum combines input features using attention weights to focus on important parts.
PyTorch
weighted_sum = (weights * x).sum(dim=1)
Sample Model

This program shows a simple attention mechanism focusing on the first feature of each input vector. We set weights manually to see how attention picks important parts.

PyTorch
import torch
import torch.nn as nn

class SimpleAttention(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.attention = nn.Linear(input_dim, 1)

    def forward(self, x):
        scores = self.attention(x)  # (batch, seq_len, 1)
        weights = torch.softmax(scores, dim=1)
        weighted_sum = (weights * x).sum(dim=1)
        return weighted_sum, weights

# Create dummy data: batch=1, seq_len=4, features=3
x = torch.tensor([[[1.0, 0.0, 0.0],
                   [0.0, 2.0, 0.0],
                   [0.0, 0.0, 3.0],
                   [1.0, 1.0, 1.0]]])

model = SimpleAttention(input_dim=3)

# Manually set weights to see clear attention
with torch.no_grad():
    model.attention.weight.copy_(torch.tensor([[1.0, 0.0, 0.0]]))
    model.attention.bias.fill_(0)

weighted_sum, weights = model(x)
print("Attention weights:", weights)
print("Weighted sum:", weighted_sum)
OutputSuccess
Important Notes

Attention helps models look at the most useful parts instead of everything equally.

It works well with sequences like sentences or time series.

Attention made big improvements in language translation and many other tasks.

Summary

Attention lets models focus on important data parts.

This focus improves learning and results.

It changed how deep learning models work, especially for language and vision.