Consider the following PyTorch code snippet using an LSTM layer. What will be the shape of the hidden state h_n after running the forward pass?
import torch import torch.nn as nn lstm = nn.LSTM(input_size=10, hidden_size=20, num_layers=2) inputs = torch.randn(5, 3, 10) # seq_len=5, batch=3, input_size=10 output, (h_n, c_n) = lstm(inputs) # What is h_n.shape?
Remember the hidden state shape is (num_layers, batch_size, hidden_size).
The hidden state h_n of an LSTM has shape (num_layers, batch_size, hidden_size). Here, num_layers=2, batch_size=3, hidden_size=20, so the shape is (2, 3, 20).
In training recurrent neural networks, why is it important to detach the hidden state from the computation graph between batches?
Think about how backpropagation works through time in RNNs.
Detaching the hidden state stops gradients from flowing back through all previous batches, which prevents excessive memory use and long backpropagation chains.
What is wrong with the following code snippet for initializing hidden states in a GRU model?
import torch import torch.nn as nn gru = nn.GRU(input_size=8, hidden_size=16, num_layers=1) batch_size = 4 h0 = torch.zeros(1, batch_size, 16) # Intended hidden state inputs = torch.randn(10, batch_size, 8) output, hn = gru(inputs, h0)
Check the expected shape of the initial hidden state for GRU layers.
The initial hidden state h0 must have shape (num_layers, batch_size, hidden_size). Here, it is missing the num_layers dimension.
Increasing the hidden state size in an RNN model primarily affects which of the following?
Think about what hidden state size controls in an RNN.
The hidden state size controls the dimensionality of the internal memory, allowing the model to represent more complex information.
In a sequence classification task using an LSTM, which hidden state output is typically used as the representation for the entire sequence?
Consider which hidden state summarizes the sequence information after processing.
The last hidden state of the top LSTM layer is commonly used as the fixed-size vector representing the entire input sequence for classification.