Consider the following TensorFlow code snippet creating an LSTM layer:
import tensorflow as tf lstm_layer = tf.keras.layers.LSTM(10, return_sequences=True) input_tensor = tf.random.uniform((32, 5, 8)) # batch=32, time_steps=5, features=8 output = lstm_layer(input_tensor) print(output.shape)
What is the printed output shape?
import tensorflow as tf lstm_layer = tf.keras.layers.LSTM(10, return_sequences=True) input_tensor = tf.random.uniform((32, 5, 8)) output = lstm_layer(input_tensor) print(output.shape)
Remember that return_sequences=True makes the LSTM return output for each time step.
The input shape is (batch_size=32, time_steps=5, features=8). With return_sequences=True, the output shape is (batch_size, time_steps, units), so (32, 5, 10).
You want to build a model to predict the next word in a sentence based on previous words. Which model layer is best suited for this task?
Think about which layer can remember information over time steps.
LSTM layers are designed to handle sequence data and remember information over time, making them ideal for predicting the next word in a sentence.
What is the most likely effect of increasing the number of units in an LSTM layer from 50 to 200?
More units mean more parameters and complexity.
Increasing units increases model capacity and parameters, which can improve accuracy but also increase overfitting risk and training time.
During training an LSTM model, you observe the training loss steadily decreases but the validation loss starts increasing after some epochs. What does this indicate?
Think about what it means when validation loss worsens but training loss improves.
When training loss decreases but validation loss increases, the model is learning the training data too well and not generalizing, which is overfitting.
Given this code snippet:
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.LSTM(32, input_shape=(10,)),
tf.keras.layers.Dense(1)
])
input_data = tf.random.uniform((64, 10, 5))
output = model(input_data)
print(output.shape)What error will this code raise?
Check the input_shape parameter and the actual input data shape.
The LSTM layer expects input shape (time_steps, features). Here, input_shape=(10,) means time_steps=10 but features dimension is missing. The input data has shape (64, 10, 5), so features=5. This mismatch causes a ValueError.