Sequence-to-sequence architecture helps computers turn one sequence of information into another. It is useful when the input and output are both sequences, like translating languages or summarizing text.
Sequence-to-sequence architecture in NLP
encoder = Encoder(input_size, hidden_size)
decoder = Decoder(hidden_size, output_size)
# Forward pass
encoder_outputs, encoder_hidden = encoder(input_sequence)
output_sequence = decoder(encoder_hidden, target_length)The encoder reads the input sequence and creates a summary called the hidden state.
The decoder uses this hidden state to generate the output sequence step by step.
encoder = Encoder(10, 20) decoder = Decoder(20, 15) encoder_outputs, encoder_hidden = encoder(input_seq) output_seq = decoder(encoder_hidden, 5)
# Using a simple RNN encoder and decoder encoder = SimpleRNNEncoder(input_dim=50, hidden_dim=100) decoder = SimpleRNNDecoder(hidden_dim=100, output_dim=50) enc_out, enc_hidden = encoder(input_seq) predicted_seq = decoder(enc_hidden, max_length=10)
This code builds a simple sequence-to-sequence model using PyTorch. The encoder reads the input sequence and creates a hidden state. The decoder uses this hidden state to generate a new sequence of length 4. The output shape and a sample output vector are printed.
import torch import torch.nn as nn class Encoder(nn.Module): def __init__(self, input_size, hidden_size): super().__init__() self.rnn = nn.GRU(input_size, hidden_size, batch_first=True) def forward(self, x): outputs, hidden = self.rnn(x) return outputs, hidden class Decoder(nn.Module): def __init__(self, hidden_size, output_size): super().__init__() self.rnn = nn.GRU(output_size, hidden_size, batch_first=True) self.fc = nn.Linear(hidden_size, output_size) def forward(self, hidden, seq_len): batch_size = hidden.size(1) inputs = torch.zeros(batch_size, 1, self.fc.out_features) outputs = [] for _ in range(seq_len): out, hidden = self.rnn(inputs, hidden) out = self.fc(out) outputs.append(out) inputs = out return torch.cat(outputs, dim=1) # Create dummy input: batch=2, seq_len=3, input_size=4 input_seq = torch.randn(2, 3, 4) encoder = Encoder(input_size=4, hidden_size=5) decoder = Decoder(hidden_size=5, output_size=6) encoder_outputs, encoder_hidden = encoder(input_seq) output_seq = decoder(encoder_hidden, seq_len=4) print('Output shape:', output_seq.shape) print('Output sample:', output_seq[0, 0].detach().numpy())
The encoder summarizes the input sequence into a fixed-size hidden state.
The decoder generates the output sequence one step at a time, often using its previous output as the next input.
Sequence lengths can vary, so models often use padding or special tokens to handle this.
Sequence-to-sequence models turn one sequence into another, like translating or summarizing.
They use an encoder to read input and a decoder to write output.
This architecture is key for many language and time-based tasks.