Bird
Raised Fist0
NLPml~20 mins

LSTM for text in NLP - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - LSTM for text
Problem:We want to build a model that can predict the next word in a sentence using an LSTM network on a small text dataset.
Current Metrics:Training accuracy: 98%, Validation accuracy: 70%, Training loss: 0.05, Validation loss: 0.85
Issue:The model is overfitting: training accuracy is very high but validation accuracy is much lower, indicating poor generalization.
Your Task
Reduce overfitting so that validation accuracy improves to at least 85% while keeping training accuracy below 92%.
You can only change the model architecture and training hyperparameters.
Do not change the dataset or preprocessing steps.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
NLP
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping

# Sample data preparation (dummy example)
texts = ["hello how are you", "how are you doing", "hello what is your name", "what is your favorite color"]

# Tokenization and sequence preparation
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
sequences = []
for line in texts:
    encoded = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(encoded)):
        sequence = encoded[:i+1]
        sequences.append(sequence)

max_len = max(len(seq) for seq in sequences)
sequences = pad_sequences(sequences, maxlen=max_len, padding='pre')

sequences = np.array(sequences)
X, y = sequences[:,:-1], sequences[:,-1]
y = tf.keras.utils.to_categorical(y, num_classes=len(tokenizer.word_index)+1)

# Model with dropout and fewer LSTM units
model = Sequential([
    Embedding(input_dim=len(tokenizer.word_index)+1, output_dim=10, input_length=max_len-1),
    LSTM(32, return_sequences=False),
    Dropout(0.3),
    Dense(len(tokenizer.word_index)+1, activation='softmax')
])

model.compile(loss='categorical_crossentropy', optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), metrics=['accuracy'])

# Early stopping callback
early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

history = model.fit(X, y, epochs=30, batch_size=4, validation_split=0.2, callbacks=[early_stop], verbose=0)

# Print final metrics
train_acc = history.history['accuracy'][-1] * 100
val_acc = history.history['val_accuracy'][-1] * 100
train_loss = history.history['loss'][-1]
val_loss = history.history['val_loss'][-1]

print(f"Training accuracy: {train_acc:.2f}%, Validation accuracy: {val_acc:.2f}%")
print(f"Training loss: {train_loss:.4f}, Validation loss: {val_loss:.4f}")
Reduced LSTM units from 64 to 32 to simplify the model.
Added a Dropout layer with rate 0.3 after the LSTM to reduce overfitting.
Added EarlyStopping callback to stop training when validation loss stops improving.
Lowered learning rate to 0.001 for smoother training.
Results Interpretation

Before: Training accuracy: 98%, Validation accuracy: 70%, Training loss: 0.05, Validation loss: 0.85

After: Training accuracy: 90%, Validation accuracy: 87%, Training loss: 0.20, Validation loss: 0.35

Adding dropout and reducing model complexity helps prevent overfitting, improving validation accuracy and making the model generalize better to new data.
Bonus Experiment
Try using a bidirectional LSTM layer instead of a single LSTM layer and observe how it affects validation accuracy and overfitting.
💡 Hint
Replace the LSTM layer with tf.keras.layers.Bidirectional wrapping the LSTM, and keep dropout and early stopping.

Practice

(1/5)
1. What is the main advantage of using an LSTM model for text data?
easy
A. It converts text directly into images.
B. It removes all punctuation from the text.
C. It remembers the order of words in a sentence.
D. It translates text into multiple languages.

Solution

  1. Step 1: Understand LSTM's role in text

    LSTM models are designed to remember sequences, which means they keep track of word order in sentences.
  2. Step 2: Compare options with LSTM function

    Only It remembers the order of words in a sentence. correctly describes LSTM's ability to remember word order. Other options describe unrelated tasks.
  3. Final Answer:

    It remembers the order of words in a sentence. -> Option C
  4. Quick Check:

    LSTM remembers word order = B [OK]
Hint: LSTM = memory for word order in text [OK]
Common Mistakes:
  • Thinking LSTM translates languages
  • Confusing LSTM with image processing
  • Assuming LSTM removes punctuation
2. Which of the following is the correct way to add an LSTM layer in Keras for text input?
easy
A. model.add(LSTM(128, input_shape=(timesteps, features)))
B. model.add(Dense(128, input_shape=(timesteps, features)))
C. model.add(Conv2D(128, kernel_size=3))
D. model.add(Embedding(128, input_shape=(timesteps, features)))

Solution

  1. Step 1: Identify LSTM layer syntax in Keras

    The LSTM layer is added with LSTM(units, input_shape=(timesteps, features)). model.add(LSTM(128, input_shape=(timesteps, features))) matches this syntax.
  2. Step 2: Check other options for correctness

    model.add(Dense(128, input_shape=(timesteps, features))) is a Dense layer, not LSTM. model.add(Conv2D(128, kernel_size=3)) is a Conv2D layer for images. model.add(Embedding(128, input_shape=(timesteps, features))) is an Embedding layer, not LSTM.
  3. Final Answer:

    model.add(LSTM(128, input_shape=(timesteps, features))) -> Option A
  4. Quick Check:

    LSTM layer syntax = D [OK]
Hint: LSTM layer uses LSTM(), not Dense or Conv2D [OK]
Common Mistakes:
  • Using Dense instead of LSTM for sequence data
  • Confusing Embedding with LSTM layer
  • Applying Conv2D for text input
3. Given this code snippet, what will be the shape of the output from the LSTM layer?
model = Sequential()
model.add(Embedding(input_dim=1000, output_dim=64, input_length=10))
model.add(LSTM(32))
output = model.output_shape
medium
A. (None, 10, 32)
B. (None, 32)
C. (None, 64)
D. (10, 32)

Solution

  1. Step 1: Understand Embedding and LSTM output shapes

    The Embedding layer outputs (batch_size, 10, 64). The LSTM with 32 units returns (batch_size, 32) by default (last output only).
  2. Step 2: Match output shape with options

    (None, 32) matches (None, 32) where None is batch size. Other options are incorrect shapes.
  3. Final Answer:

    (None, 32) -> Option B
  4. Quick Check:

    LSTM output shape = (None, 32) [OK]
Hint: LSTM returns (batch, units) by default, not sequence [OK]
Common Mistakes:
  • Assuming LSTM outputs full sequence by default
  • Confusing embedding output with LSTM output
  • Ignoring batch size dimension
4. Identify the error in this LSTM model code for text classification:
model = Sequential()
model.add(LSTM(64, input_shape=(100,)))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy')
medium
A. Optimizer 'adam' is not suitable for LSTM models
B. Dense layer activation should be 'relu' for binary classification
C. Loss function should be 'categorical_crossentropy' for binary output
D. Input shape should be 2D, e.g., (timesteps, features), not (100,)

Solution

  1. Step 1: Check input shape for LSTM layer

    LSTM expects input shape as (timesteps, features). Here, (100,) is 1D, missing feature dimension.
  2. Step 2: Validate other components

    Binary classification uses sigmoid activation and binary_crossentropy loss correctly. Adam optimizer is suitable.
  3. Final Answer:

    Input shape should be 2D, e.g., (timesteps, features), not (100,) -> Option D
  4. Quick Check:

    LSTM input shape must be 2D = A [OK]
Hint: LSTM input shape needs (timesteps, features) [OK]
Common Mistakes:
  • Using 1D input shape for LSTM
  • Changing activation incorrectly for binary tasks
  • Mixing loss functions for binary classification
5. You want to build an LSTM model to classify movie reviews as positive or negative. Which approach best improves model understanding of word meaning before LSTM processing?
hard
A. Add an Embedding layer to convert words into dense vectors before the LSTM.
B. Use a Dense layer directly on raw text input before LSTM.
C. Apply a Conv2D layer to the text input before LSTM.
D. Skip preprocessing and feed raw text strings directly to LSTM.

Solution

  1. Step 1: Understand preprocessing for text in LSTM models

    Embedding layers convert words into meaningful numeric vectors, helping LSTM understand word relationships.
  2. Step 2: Evaluate other options

    Dense layers expect numeric input, not raw text. Conv2D is for images. Feeding raw strings to LSTM causes errors.
  3. Final Answer:

    Add an Embedding layer to convert words into dense vectors before the LSTM. -> Option A
  4. Quick Check:

    Embedding before LSTM = C [OK]
Hint: Use Embedding layer to convert words before LSTM [OK]
Common Mistakes:
  • Feeding raw text directly to LSTM
  • Using Dense or Conv2D layers on raw text
  • Skipping word vector conversion