Recurrent Neural Networks (RNNs) are designed to process sequential data. What is the main reason they are better suited for sequences than regular feedforward neural networks?
Think about how remembering past information helps understand sentences or time series.
RNNs have loops that let them keep information from previous steps, so they understand the order and context in sequences. This is why they work well with text, speech, and time series.
Consider this TensorFlow code creating an RNN layer. What is the shape of the output tensor?
import tensorflow as tf rnn_layer = tf.keras.layers.SimpleRNN(10, return_sequences=True) input_data = tf.random.uniform((5, 7, 8)) # batch=5, time=7, features=8 output = rnn_layer(input_data) print(output.shape)
return_sequences=True means output for every time step is returned.
The input has batch size 5, sequence length 7, and features 8. The RNN outputs 10 features per time step. With return_sequences=True, output shape is (5, 7, 10).
You want to predict the next word in a sentence, but the sentence can be very long and the important context might be far back. Which model is best suited?
Think about which model remembers information for a long time.
LSTM networks are designed to remember information over long sequences, solving the vanishing gradient problem of simple RNNs. CNNs and feedforward networks do not handle long-term dependencies well.
When training an RNN on very long sequences, what is a common technique to improve training stability and speed?
Think about how to handle long sequences without losing too much context.
Truncated backpropagation through time breaks long sequences into smaller parts to reduce memory use and improve gradient flow, making training more stable and faster.
You trained an RNN to predict the next character in a text sequence. After training, you want to measure how well it predicts the sequence. Which metric is most appropriate?
Think about the task: predicting the next character exactly.
Accuracy measures how many predicted characters match the true next characters, which is suitable for character-level prediction. MSE is for regression, BLEU is for sentence-level translation quality, and confusion matrix is more detailed but less direct here.