NLPml~20 mins

LSTM for text in NLP - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - LSTM for text

Problem:We want to build a model that can predict the next word in a sentence using an LSTM network on a small text dataset.

Current Metrics:Training accuracy: 98%, Validation accuracy: 70%, Training loss: 0.05, Validation loss: 0.85

Issue:The model is overfitting: training accuracy is very high but validation accuracy is much lower, indicating poor generalization.

Your Task

Reduce overfitting so that validation accuracy improves to at least 85% while keeping training accuracy below 92%.

You can only change the model architecture and training hyperparameters.

Do not change the dataset or preprocessing steps.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

NLP

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping

# Sample data preparation (dummy example)
texts = ["hello how are you", "how are you doing", "hello what is your name", "what is your favorite color"]

# Tokenization and sequence preparation
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
sequences = []
for line in texts:
    encoded = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(encoded)):
        sequence = encoded[:i+1]
        sequences.append(sequence)

max_len = max(len(seq) for seq in sequences)
sequences = pad_sequences(sequences, maxlen=max_len, padding='pre')

sequences = np.array(sequences)
X, y = sequences[:,:-1], sequences[:,-1]
y = tf.keras.utils.to_categorical(y, num_classes=len(tokenizer.word_index)+1)

# Model with dropout and fewer LSTM units
model = Sequential([
    Embedding(input_dim=len(tokenizer.word_index)+1, output_dim=10, input_length=max_len-1),
    LSTM(32, return_sequences=False),
    Dropout(0.3),
    Dense(len(tokenizer.word_index)+1, activation='softmax')
])

model.compile(loss='categorical_crossentropy', optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), metrics=['accuracy'])

# Early stopping callback
early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

history = model.fit(X, y, epochs=30, batch_size=4, validation_split=0.2, callbacks=[early_stop], verbose=0)

# Print final metrics
train_acc = history.history['accuracy'][-1] * 100
val_acc = history.history['val_accuracy'][-1] * 100
train_loss = history.history['loss'][-1]
val_loss = history.history['val_loss'][-1]

print(f"Training accuracy: {train_acc:.2f}%, Validation accuracy: {val_acc:.2f}%")
print(f"Training loss: {train_loss:.4f}, Validation loss: {val_loss:.4f}")

Reduced LSTM units from 64 to 32 to simplify the model.

Added a Dropout layer with rate 0.3 after the LSTM to reduce overfitting.

Added EarlyStopping callback to stop training when validation loss stops improving.

Lowered learning rate to 0.001 for smoother training.

Results Interpretation

Before: Training accuracy: 98%, Validation accuracy: 70%, Training loss: 0.05, Validation loss: 0.85

After: Training accuracy: 90%, Validation accuracy: 87%, Training loss: 0.20, Validation loss: 0.35

Adding dropout and reducing model complexity helps prevent overfitting, improving validation accuracy and making the model generalize better to new data.

Bonus Experiment

Try using a bidirectional LSTM layer instead of a single LSTM layer and observe how it affects validation accuracy and overfitting.

💡 Hint

Replace the LSTM layer with tf.keras.layers.Bidirectional wrapping the LSTM, and keep dropout and early stopping.

Practice

(1/5)

1. What is the main advantage of using an LSTM model for text data?

easy

A. It converts text directly into images.

B. It removes all punctuation from the text.

C. It remembers the order of words in a sentence.

D. It translates text into multiple languages.

LSTM for text in NLP - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand LSTM's role in text

Step 2: Compare options with LSTM function

Final Answer:

Quick Check:

Solution

Step 1: Identify LSTM layer syntax in Keras

Step 2: Check other options for correctness

Final Answer:

Quick Check:

Solution

Step 1: Understand Embedding and LSTM output shapes

Step 2: Match output shape with options

Final Answer:

Quick Check:

Solution

Step 1: Check input shape for LSTM layer

Step 2: Validate other components

Final Answer:

Quick Check:

Solution

Step 1: Understand preprocessing for text in LSTM models

Step 2: Evaluate other options

Final Answer:

Quick Check: