You want to build a text generation model using TensorFlow. Which RNN layer is best suited to capture long-range dependencies in text?
Think about which layer can remember information over many time steps.
LSTM layers are designed to remember information for long sequences, making them ideal for text generation tasks.
Consider this TensorFlow code snippet for a text generation model:
model = tf.keras.Sequential([ tf.keras.layers.Embedding(input_dim=1000, output_dim=64), tf.keras.layers.LSTM(128, return_sequences=True), tf.keras.layers.Dense(1000) ]) output_shape = model.output_shape
What is the value of output_shape?
Check the last Dense layer's output units and the return_sequences=True setting.
The LSTM outputs a sequence (because return_sequences=True), so the shape is (batch_size, sequence_length, 128). The Dense layer maps each time step to 1000 units, so final output shape is (batch_size, sequence_length, 1000).
When training an RNN for text generation, what is the main effect of increasing the input sequence length?
Think about how longer sequences affect the amount of data processed per training step.
Longer input sequences mean the model processes more time steps per batch, increasing computation and memory needs.
Which metric is most appropriate to evaluate the quality of a text generation RNN model during training?
Consider a metric that measures how well the model predicts the next word probabilities.
Perplexity measures how well a probability model predicts a sample. Lower perplexity means better predictions in language models.
You trained an RNN text generation model but it fails to learn long-term dependencies. Which of the following is the most likely cause?
Think about which RNN type struggles with remembering information over many steps.
SimpleRNN layers lack gating mechanisms and often suffer from vanishing gradients, making it hard to learn long-term dependencies.