Sequence-to-sequence architecture helps computers turn one sequence of information into another. It is useful when the input and output are both sequences, like translating languages or summarizing text.
Sequence-to-sequence architecture in NLP
Start learning this pattern below
Jump into concepts and practice - no test required
encoder = Encoder(input_size, hidden_size)
decoder = Decoder(hidden_size, output_size)
# Forward pass
encoder_outputs, encoder_hidden = encoder(input_sequence)
output_sequence = decoder(encoder_hidden, target_length)The encoder reads the input sequence and creates a summary called the hidden state.
The decoder uses this hidden state to generate the output sequence step by step.
encoder = Encoder(10, 20) decoder = Decoder(20, 15) encoder_outputs, encoder_hidden = encoder(input_seq) output_seq = decoder(encoder_hidden, 5)
# Using a simple RNN encoder and decoder encoder = SimpleRNNEncoder(input_dim=50, hidden_dim=100) decoder = SimpleRNNDecoder(hidden_dim=100, output_dim=50) enc_out, enc_hidden = encoder(input_seq) predicted_seq = decoder(enc_hidden, max_length=10)
This code builds a simple sequence-to-sequence model using PyTorch. The encoder reads the input sequence and creates a hidden state. The decoder uses this hidden state to generate a new sequence of length 4. The output shape and a sample output vector are printed.
import torch import torch.nn as nn class Encoder(nn.Module): def __init__(self, input_size, hidden_size): super().__init__() self.rnn = nn.GRU(input_size, hidden_size, batch_first=True) def forward(self, x): outputs, hidden = self.rnn(x) return outputs, hidden class Decoder(nn.Module): def __init__(self, hidden_size, output_size): super().__init__() self.rnn = nn.GRU(output_size, hidden_size, batch_first=True) self.fc = nn.Linear(hidden_size, output_size) def forward(self, hidden, seq_len): batch_size = hidden.size(1) inputs = torch.zeros(batch_size, 1, self.fc.out_features) outputs = [] for _ in range(seq_len): out, hidden = self.rnn(inputs, hidden) out = self.fc(out) outputs.append(out) inputs = out return torch.cat(outputs, dim=1) # Create dummy input: batch=2, seq_len=3, input_size=4 input_seq = torch.randn(2, 3, 4) encoder = Encoder(input_size=4, hidden_size=5) decoder = Decoder(hidden_size=5, output_size=6) encoder_outputs, encoder_hidden = encoder(input_seq) output_seq = decoder(encoder_hidden, seq_len=4) print('Output shape:', output_seq.shape) print('Output sample:', output_seq[0, 0].detach().numpy())
The encoder summarizes the input sequence into a fixed-size hidden state.
The decoder generates the output sequence one step at a time, often using its previous output as the next input.
Sequence lengths can vary, so models often use padding or special tokens to handle this.
Sequence-to-sequence models turn one sequence into another, like translating or summarizing.
They use an encoder to read input and a decoder to write output.
This architecture is key for many language and time-based tasks.
Practice
Solution
Step 1: Understand the encoder's function
The encoder processes the input sequence and converts it into a meaningful representation.Step 2: Differentiate encoder from decoder
The decoder uses this representation to generate the output sequence, so it does not directly read input.Final Answer:
To read and understand the input sequence -> Option BQuick Check:
Encoder = input reader [OK]
- Confusing encoder with decoder
- Thinking encoder generates output
- Assuming encoder evaluates accuracy
Solution
Step 1: Identify decoder's role
The decoder takes the encoded input and produces the output sequence step-by-step.Step 2: Eliminate incorrect options
Encoding is done by the encoder, not the decoder; normalization and splitting are preprocessing steps.Final Answer:
It generates the output sequence from the encoded input -> Option AQuick Check:
Decoder = output generator [OK]
- Mixing encoder and decoder roles
- Confusing preprocessing with decoding
- Assuming decoder encodes input
encoded = encoder(input_sequence) output = decoder(encoded) print(len(output))If the input sequence length is 5 and the model is trained to translate to a sequence of length 7, what will
len(output) print?Solution
Step 1: Understand input and output lengths
The input sequence length is 5, but the model is trained to produce output sequences of length 7.Step 2: Recognize decoder output length
The decoder generates output sequences based on training, so output length should be 7 regardless of input length.Final Answer:
7 -> Option DQuick Check:
Output length = trained target length = 7 [OK]
- Assuming output length equals input length
- Adding input and output lengths
- Saying output length is unknown
for input_seq, target_seq in dataset:
encoded = encoder(input_seq)
output = decoder(encoded)
loss = loss_function(output, target_seq)
loss.backward()
optimizer.step()
optimizer.zero_grad()
What is the likely error in this code?Solution
Step 1: Recall training step order
Gradients must be cleared before computing new gradients with loss.backward().Step 2: Identify correct zero_grad() placement
optimizer.zero_grad() should be called before loss.backward(), not after optimizer.step().Final Answer:
Missing call to optimizer.zero_grad() before loss.backward() -> Option CQuick Check:
Clear grads before backward pass [OK]
- Calling zero_grad() after backward()
- Calling optimizer.step() before backward()
- Skipping zero_grad() entirely
Solution
Step 1: Understand attention's purpose
Attention helps the decoder look at different parts of the input sequence when generating each output token.Step 2: Compare with fixed vector encoding
Without attention, the encoder compresses input into one fixed vector, which can lose details.Step 3: Eliminate incorrect options
Attention does not reduce input size, skip encoder, or replace decoder; it enhances focus during decoding.Final Answer:
It allows the decoder to focus on relevant parts of the input sequence dynamically -> Option AQuick Check:
Attention = dynamic focus on input [OK]
- Thinking attention reduces input size
- Believing attention skips encoder
- Assuming attention replaces decoder
