A Transformer encoder helps a computer understand sequences like sentences by looking at all parts at once, not just step-by-step.
Transformer encoder in PyTorch
torch.nn.TransformerEncoder(encoder_layer, num_layers, norm=None)encoder_layer is a single Transformer encoder block that you repeat.
num_layers is how many times you stack the encoder blocks.
encoder_layer = torch.nn.TransformerEncoderLayer(d_model=512, nhead=8) transformer_encoder = torch.nn.TransformerEncoder(encoder_layer, num_layers=6)
encoder_layer = torch.nn.TransformerEncoderLayer(d_model=256, nhead=4) transformer_encoder = torch.nn.TransformerEncoder(encoder_layer, num_layers=3, norm=torch.nn.LayerNorm(256))
This code builds a Transformer encoder with 2 layers. It processes a random sequence of length 10 with 16 features. The output shape shows the sequence length, batch size, and feature size stay the same.
import torch import torch.nn as nn # Define a single encoder layer encoder_layer = nn.TransformerEncoderLayer(d_model=16, nhead=4) # Stack 2 layers to build the encoder transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=2) # Create a sample input: sequence length=10, batch size=1, feature size=16 input_seq = torch.rand(10, 1, 16) # Pass input through the transformer encoder output = transformer_encoder(input_seq) # Print output shape and first vector print('Output shape:', output.shape) print('First output vector:', output[0, 0])
The input to TransformerEncoder must have shape (sequence_length, batch_size, feature_size).
TransformerEncoderLayer uses multi-head attention to look at all parts of the sequence together.
Adding normalization (LayerNorm) can help training be more stable.
Transformer encoder processes sequences by looking at all parts at once.
It stacks multiple encoder layers for deeper understanding.
Input shape is (sequence length, batch size, features) and output shape matches input shape.