Sequence-to-sequence models are widely used in tasks like language translation. What is their main purpose?
Think about tasks like translating a sentence from English to French.
Sequence-to-sequence models take a sequence as input and produce another sequence as output, often of different length, such as translating sentences or summarizing text.
Consider a seq2seq model with an LSTM encoder and decoder. The encoder processes input sequences of length 10 with 16 features, and the decoder outputs sequences of length 12 with 20 features. What is the shape of the encoder's final hidden state and the decoder's output?
import torch import torch.nn as nn encoder = nn.LSTM(input_size=16, hidden_size=32, batch_first=True) decoder = nn.LSTM(input_size=20, hidden_size=32, batch_first=True) inputs = torch.randn(5, 10, 16) # batch_size=5 encoder_outputs, (h_n, c_n) = encoder(inputs) decoder_inputs = torch.randn(5, 12, 20) decoder_outputs, _ = decoder(decoder_inputs, (h_n, c_n)) print(h_n.shape, decoder_outputs.shape)
Remember LSTM hidden states have shape (num_layers * num_directions, batch, hidden_size).
The encoder's hidden state shape is (1, batch_size, hidden_size) because it has 1 layer and is unidirectional. The decoder output shape is (batch_size, sequence_length, hidden_size).
You are training a sequence-to-sequence model for machine translation. Which hidden size choice is most likely to improve model capacity without causing excessive overfitting?
Think about balancing model complexity and regularization.
Increasing hidden size can improve capacity, but must be paired with regularization like dropout and early stopping to avoid overfitting.
You trained a seq2seq model for text summarization. Which metric best measures how well the model output matches human summaries?
Think about metrics used in natural language generation tasks.
BLEU score compares n-gram overlap between generated and reference text, making it suitable for summarization evaluation.
Consider this simplified training loop for a seq2seq model. Why might the gradients explode?
for input_seq, target_seq in dataloader: optimizer.zero_grad() output_seq = model(input_seq, target_seq) loss = loss_fn(output_seq.view(-1, vocab_size), target_seq.view(-1)) loss.backward() optimizer.step()
Think about common causes of exploding gradients in RNNs.
Without gradient clipping, RNNs like seq2seq models can have gradients that grow exponentially, causing instability during training.