What is Sequence-to-sequence architecture in NLP?

NLPml~5 mins

Sequence-to-sequence architecture in NLP

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Introduction

Sequence-to-sequence architecture helps computers turn one sequence of information into another. It is useful when the input and output are both sequences, like translating languages or summarizing text.

Translating a sentence from English to French.

Turning a spoken sentence into written text.

Summarizing a long article into a short paragraph.

Generating a reply in a chatbot conversation.

Converting a sequence of numbers into another sequence, like time series prediction.

Syntax

NLP

encoder = Encoder(input_size, hidden_size)
decoder = Decoder(hidden_size, output_size)

# Forward pass
encoder_outputs, encoder_hidden = encoder(input_sequence)
output_sequence = decoder(encoder_hidden, target_length)

The encoder reads the input sequence and creates a summary called the hidden state.

The decoder uses this hidden state to generate the output sequence step by step.

Examples

This example shows creating an encoder and decoder with specific sizes, then running them on input to get output.

NLP

encoder = Encoder(10, 20)
decoder = Decoder(20, 15)

encoder_outputs, encoder_hidden = encoder(input_seq)
output_seq = decoder(encoder_hidden, 5)

Here, both encoder and decoder use simple RNN layers to process sequences.

NLP

# Using a simple RNN encoder and decoder
encoder = SimpleRNNEncoder(input_dim=50, hidden_dim=100)
decoder = SimpleRNNDecoder(hidden_dim=100, output_dim=50)

enc_out, enc_hidden = encoder(input_seq)
predicted_seq = decoder(enc_hidden, max_length=10)

Sample Model

This code builds a simple sequence-to-sequence model using PyTorch. The encoder reads the input sequence and creates a hidden state. The decoder uses this hidden state to generate a new sequence of length 4. The output shape and a sample output vector are printed.

NLP

import torch
import torch.nn as nn

class Encoder(nn.Module):
    def __init__(self, input_size, hidden_size):
        super().__init__()
        self.rnn = nn.GRU(input_size, hidden_size, batch_first=True)
    def forward(self, x):
        outputs, hidden = self.rnn(x)
        return outputs, hidden

class Decoder(nn.Module):
    def __init__(self, hidden_size, output_size):
        super().__init__()
        self.rnn = nn.GRU(output_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
    def forward(self, hidden, seq_len):
        batch_size = hidden.size(1)
        inputs = torch.zeros(batch_size, 1, self.fc.out_features)
        outputs = []
        for _ in range(seq_len):
            out, hidden = self.rnn(inputs, hidden)
            out = self.fc(out)
            outputs.append(out)
            inputs = out
        return torch.cat(outputs, dim=1)

# Create dummy input: batch=2, seq_len=3, input_size=4
input_seq = torch.randn(2, 3, 4)

encoder = Encoder(input_size=4, hidden_size=5)
decoder = Decoder(hidden_size=5, output_size=6)

encoder_outputs, encoder_hidden = encoder(input_seq)
output_seq = decoder(encoder_hidden, seq_len=4)

print('Output shape:', output_seq.shape)
print('Output sample:', output_seq[0, 0].detach().numpy())

OutputSuccess

Important Notes

The encoder summarizes the input sequence into a fixed-size hidden state.

The decoder generates the output sequence one step at a time, often using its previous output as the next input.

Sequence lengths can vary, so models often use padding or special tokens to handle this.

Summary

Sequence-to-sequence models turn one sequence into another, like translating or summarizing.

They use an encoder to read input and a decoder to write output.

This architecture is key for many language and time-based tasks.