How to use positional encoding pytorch

PytorchHow-ToBeginner · 4 min read

How to Use Positional Encoding in PyTorch: Simple Guide

In PyTorch, positional encoding is used to add information about the position of tokens in a sequence, typically by creating a fixed or learnable tensor added to input embeddings. You can implement it by defining a positional encoding class that generates sinusoidal or learned embeddings and adds them to your input tensor before feeding it to the model.

📐

Syntax

Positional encoding in PyTorch usually involves creating a tensor that encodes position information and adding it to input embeddings. The common pattern is:

positional_encoding = PositionalEncoding(d_model, max_len): Initialize with embedding size and max sequence length.
output = input_embeddings + positional_encoding.pe[:, :input_length]: Add positional encoding to input embeddings.

This helps the model understand the order of tokens in sequences.

python

class PositionalEncoding(torch.nn.Module):
    def __init__(self, d_model, max_len=5000):
        super().__init__()
        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-torch.log(torch.tensor(10000.0)) / d_model))
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0)  # Shape: (1, max_len, d_model)
        self.register_buffer('pe', pe)

    def forward(self, x):
        x = x + self.pe[:, :x.size(1)]
        return x

💻

Example

This example shows how to create positional encoding and add it to a batch of input embeddings with PyTorch.

python

import torch

class PositionalEncoding(torch.nn.Module):
    def __init__(self, d_model, max_len=5000):
        super().__init__()
        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-torch.log(torch.tensor(10000.0)) / d_model))
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0)  # Shape: (1, max_len, d_model)
        self.register_buffer('pe', pe)

    def forward(self, x):
        x = x + self.pe[:, :x.size(1)]
        return x

# Parameters
batch_size = 2
seq_len = 4
d_model = 6

# Random input embeddings
input_embeddings = torch.randn(batch_size, seq_len, d_model)

# Create positional encoding
pos_encoder = PositionalEncoding(d_model)

# Add positional encoding to input embeddings
output = pos_encoder(input_embeddings)

print("Input embeddings shape:", input_embeddings.shape)
print("Output embeddings shape:", output.shape)
print("Output embeddings sample:", output[0, 0])

Output

Input embeddings shape: torch.Size([2, 4, 6]) Output embeddings shape: torch.Size([2, 4, 6]) Output embeddings sample: tensor([-0.0953, 0.1537, 0.3127, -0.0903, 0.0737, 0.0953])

⚠️

Common Pitfalls

Not matching embedding size: The positional encoding dimension d_model must match the input embedding size.
Ignoring batch dimension: Positional encoding should be broadcasted correctly to match batch size.
Using fixed max length too small: If your sequence length exceeds max_len, positional encoding will fail or truncate.
Adding positional encoding multiple times: Add it only once before feeding to the model.

python

import torch

# Wrong: positional encoding dimension mismatch
try:
    pos_encoder_wrong = PositionalEncoding(d_model=8)  # d_model != input embedding size 6
    output_wrong = pos_encoder_wrong(torch.randn(2, 4, 6))
except Exception as e:
    print(f"Error due to dimension mismatch: {e}")

# Right: matching dimensions
pos_encoder_right = PositionalEncoding(d_model=6)
output_right = pos_encoder_right(torch.randn(2, 4, 6))
print("Positional encoding added correctly with matching dimensions.")

Output

Error due to dimension mismatch: The size of tensor a (8) must match the size of tensor b (6) at non-singleton dimension 2 Positional encoding added correctly with matching dimensions.

📊

Quick Reference

Remember these tips when using positional encoding in PyTorch:

Match d_model with your embedding size.
Set max_len to cover your longest sequence.
Add positional encoding once before the model.
Use register_buffer to keep positional encoding on the right device.

✅

Key Takeaways

Positional encoding adds position info to embeddings so models understand token order.

Ensure positional encoding dimension matches your input embedding size exactly.

Use sinusoidal or learned positional encoding as a tensor added to input embeddings.

Add positional encoding once before feeding data into the model.

Set max sequence length in positional encoding to cover your longest input.