How to Use Positional Encoding in PyTorch: Simple Guide
In PyTorch, positional encoding is used to add information about the position of tokens in a sequence, typically by creating a fixed or learnable tensor added to input embeddings. You can implement it by defining a positional encoding class that generates sinusoidal or learned embeddings and adds them to your input tensor before feeding it to the model.
Syntax
Positional encoding in PyTorch usually involves creating a tensor that encodes position information and adding it to input embeddings. The common pattern is:
positional_encoding = PositionalEncoding(d_model, max_len): Initialize with embedding size and max sequence length.output = input_embeddings + positional_encoding.pe[:, :input_length]: Add positional encoding to input embeddings.
This helps the model understand the order of tokens in sequences.
python
class PositionalEncoding(torch.nn.Module): def __init__(self, d_model, max_len=5000): super().__init__() pe = torch.zeros(max_len, d_model) position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1) div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-torch.log(torch.tensor(10000.0)) / d_model)) pe[:, 0::2] = torch.sin(position * div_term) pe[:, 1::2] = torch.cos(position * div_term) pe = pe.unsqueeze(0) # Shape: (1, max_len, d_model) self.register_buffer('pe', pe) def forward(self, x): x = x + self.pe[:, :x.size(1)] return x
Example
This example shows how to create positional encoding and add it to a batch of input embeddings with PyTorch.
python
import torch class PositionalEncoding(torch.nn.Module): def __init__(self, d_model, max_len=5000): super().__init__() pe = torch.zeros(max_len, d_model) position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1) div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-torch.log(torch.tensor(10000.0)) / d_model)) pe[:, 0::2] = torch.sin(position * div_term) pe[:, 1::2] = torch.cos(position * div_term) pe = pe.unsqueeze(0) # Shape: (1, max_len, d_model) self.register_buffer('pe', pe) def forward(self, x): x = x + self.pe[:, :x.size(1)] return x # Parameters batch_size = 2 seq_len = 4 d_model = 6 # Random input embeddings input_embeddings = torch.randn(batch_size, seq_len, d_model) # Create positional encoding pos_encoder = PositionalEncoding(d_model) # Add positional encoding to input embeddings output = pos_encoder(input_embeddings) print("Input embeddings shape:", input_embeddings.shape) print("Output embeddings shape:", output.shape) print("Output embeddings sample:", output[0, 0])
Output
Input embeddings shape: torch.Size([2, 4, 6])
Output embeddings shape: torch.Size([2, 4, 6])
Output embeddings sample: tensor([-0.0953, 0.1537, 0.3127, -0.0903, 0.0737, 0.0953])
Common Pitfalls
- Not matching embedding size: The positional encoding dimension
d_modelmust match the input embedding size. - Ignoring batch dimension: Positional encoding should be broadcasted correctly to match batch size.
- Using fixed max length too small: If your sequence length exceeds
max_len, positional encoding will fail or truncate. - Adding positional encoding multiple times: Add it only once before feeding to the model.
python
import torch # Wrong: positional encoding dimension mismatch try: pos_encoder_wrong = PositionalEncoding(d_model=8) # d_model != input embedding size 6 output_wrong = pos_encoder_wrong(torch.randn(2, 4, 6)) except Exception as e: print(f"Error due to dimension mismatch: {e}") # Right: matching dimensions pos_encoder_right = PositionalEncoding(d_model=6) output_right = pos_encoder_right(torch.randn(2, 4, 6)) print("Positional encoding added correctly with matching dimensions.")
Output
Error due to dimension mismatch: The size of tensor a (8) must match the size of tensor b (6) at non-singleton dimension 2
Positional encoding added correctly with matching dimensions.
Quick Reference
Remember these tips when using positional encoding in PyTorch:
- Match
d_modelwith your embedding size. - Set
max_lento cover your longest sequence. - Add positional encoding once before the model.
- Use
register_bufferto keep positional encoding on the right device.
Key Takeaways
Positional encoding adds position info to embeddings so models understand token order.
Ensure positional encoding dimension matches your input embedding size exactly.
Use sinusoidal or learned positional encoding as a tensor added to input embeddings.
Add positional encoding once before feeding data into the model.
Set max sequence length in positional encoding to cover your longest input.