0
0
PyTorchml~5 mins

Transformer encoder in PyTorch

Choose your learning style9 modes available
Introduction

A Transformer encoder helps a computer understand sequences like sentences by looking at all parts at once, not just step-by-step.

When you want to understand the meaning of a sentence for translation.
When analyzing time series data to find patterns.
When processing audio signals to recognize speech.
When building recommendation systems that consider user history.
When summarizing long documents by focusing on important parts.
Syntax
PyTorch
torch.nn.TransformerEncoder(encoder_layer, num_layers, norm=None)

encoder_layer is a single Transformer encoder block that you repeat.

num_layers is how many times you stack the encoder blocks.

Examples
This creates a Transformer encoder with 6 layers, each layer has 512 features and 8 attention heads.
PyTorch
encoder_layer = torch.nn.TransformerEncoderLayer(d_model=512, nhead=8)
transformer_encoder = torch.nn.TransformerEncoder(encoder_layer, num_layers=6)
This creates a smaller Transformer encoder with 3 layers and adds normalization for stability.
PyTorch
encoder_layer = torch.nn.TransformerEncoderLayer(d_model=256, nhead=4)
transformer_encoder = torch.nn.TransformerEncoder(encoder_layer, num_layers=3, norm=torch.nn.LayerNorm(256))
Sample Model

This code builds a Transformer encoder with 2 layers. It processes a random sequence of length 10 with 16 features. The output shape shows the sequence length, batch size, and feature size stay the same.

PyTorch
import torch
import torch.nn as nn

# Define a single encoder layer
encoder_layer = nn.TransformerEncoderLayer(d_model=16, nhead=4)
# Stack 2 layers to build the encoder
transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=2)

# Create a sample input: sequence length=10, batch size=1, feature size=16
input_seq = torch.rand(10, 1, 16)

# Pass input through the transformer encoder
output = transformer_encoder(input_seq)

# Print output shape and first vector
print('Output shape:', output.shape)
print('First output vector:', output[0, 0])
OutputSuccess
Important Notes

The input to TransformerEncoder must have shape (sequence_length, batch_size, feature_size).

TransformerEncoderLayer uses multi-head attention to look at all parts of the sequence together.

Adding normalization (LayerNorm) can help training be more stable.

Summary

Transformer encoder processes sequences by looking at all parts at once.

It stacks multiple encoder layers for deeper understanding.

Input shape is (sequence length, batch size, features) and output shape matches input shape.