Sequence models such as RNNs read words one by one in a sentence. Why is this order important?
Think about how understanding a sentence depends on the words that came before.
RNNs keep a memory of previous words, so the order matters to capture context and meaning.
Given a simple RNN that processes a sequence of word indices, what is the output shape after processing a batch of 2 sequences each of length 3?
import torch import torch.nn as nn rnn = nn.RNN(input_size=5, hidden_size=4, batch_first=True) inputs = torch.randn(2, 3, 5) # batch=2, seq_len=3, input_size=5 output, hn = rnn(inputs) print(output.shape)
Remember batch_first=True means batch is the first dimension.
The output shape is (batch_size, sequence_length, hidden_size) = (2, 3, 4).
Among these models, which one inherently understands the order of words in a sentence without additional position information?
Think about which model processes words one after another.
RNNs process words sequentially, so they naturally capture word order. Transformers need positional encoding, and Bag-of-Words or MLP on counts ignore order.
What is a common problem when training RNNs on very long sequences, and which hyperparameter adjustment can help?
Think about what happens to gradients when backpropagating through many steps.
Long sequences cause gradients to vanish or explode. Gradient clipping helps by limiting gradient size to stabilize training.
A language model has a perplexity of 50 on a test set. What does this number mean?
Perplexity measures how well a model predicts a sequence; lower is better.
Perplexity of 50 means the model is as confused as if it had to pick from 50 equally likely options for the next word.