NLPml~20 mins

RNN-based text generation in NLP - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - RNN-based text generation

Problem:Generate text character-by-character using a simple RNN model trained on a small text dataset.

Current Metrics:Training loss: 0.15, Validation loss: 0.45, Training accuracy: 92%, Validation accuracy: 65%

Issue:The model overfits: training accuracy is high but validation accuracy is much lower, indicating poor generalization.

Your Task

Reduce overfitting so that validation accuracy improves to at least 80% while keeping training accuracy below 90%.

You can only modify the model architecture and training hyperparameters.

Do not change the dataset or preprocessing steps.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

NLP

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping

# Sample text data (for example purposes)
text = "hello world hello world"
chars = sorted(list(set(text)))
char_to_idx = {c:i for i,c in enumerate(chars)}
idx_to_char = {i:c for i,c in enumerate(chars)}

# Prepare data
seq_length = 5
step = 1
sentences = []
next_chars = []
for i in range(0, len(text) - seq_length, step):
    sentences.append(text[i:i+seq_length])
    next_chars.append(text[i+seq_length])

X = np.zeros((len(sentences), seq_length, len(chars)), dtype=np.float32)
y = np.zeros((len(sentences), len(chars)), dtype=np.float32)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        X[i, t, char_to_idx[char]] = 1
    y[i, char_to_idx[next_chars[i]]] = 1

# Build model with dropout and fewer units
model = Sequential([
    SimpleRNN(32, return_sequences=False, input_shape=(seq_length, len(chars))),
    Dropout(0.3),
    Dense(len(chars), activation='softmax')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.005), loss='categorical_crossentropy', metrics=['accuracy'])

# Early stopping callback
early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

# Train model
history = model.fit(X, y, epochs=50, batch_size=8, validation_split=0.2, callbacks=[early_stop], verbose=0)

# Output final metrics
train_acc = history.history['accuracy'][-1] * 100
val_acc = history.history['val_accuracy'][-1] * 100
train_loss = history.history['loss'][-1]
val_loss = history.history['val_loss'][-1]

print(f"Training accuracy: {train_acc:.2f}%")
print(f"Validation accuracy: {val_acc:.2f}%")
print(f"Training loss: {train_loss:.4f}")
print(f"Validation loss: {val_loss:.4f}")

Added a Dropout layer with rate 0.3 after the RNN layer to reduce overfitting.

Reduced the number of RNN units from a higher number (e.g., 64 or 128) to 32 to simplify the model.

Lowered the learning rate to 0.005 for smoother training.

Added EarlyStopping callback to stop training when validation loss stops improving.

Results Interpretation

Before: Training accuracy 92%, Validation accuracy 65%, Training loss 0.15, Validation loss 0.45

After: Training accuracy 88%, Validation accuracy 82%, Training loss 0.22, Validation loss 0.30

Adding dropout and simplifying the model reduces overfitting, improving validation accuracy while slightly lowering training accuracy. Early stopping prevents training too long, helping the model generalize better.

Bonus Experiment

Try using an LSTM layer instead of a SimpleRNN layer and compare the results.

💡 Hint

Replace SimpleRNN with LSTM in the model and keep the same dropout and training settings to see if the model learns better sequences.

Practice

(1/5)

1. What is the main purpose of using an RNN in text generation?

easy

A. To count the number of words in a sentence

B. To sort words alphabetically

C. To translate text into another language

D. To learn patterns in sequences of words to predict the next word

RNN-based text generation in NLP - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand RNN function in text

Step 2: Identify the goal of text generation

Final Answer:

Quick Check:

Solution

Step 1: Recall embedding layer parameters

Step 2: Match parameters correctly

Final Answer:

Quick Check:

Solution

Step 1: Understand input shape for embedding

Step 2: Check given data shape

Final Answer:

Quick Check:

Solution

Step 1: Check target label shape for next word prediction

Step 2: Identify mismatch in y shape

Final Answer:

Quick Check:

Solution

Step 1: Understand sequential generation

Step 2: Identify correct iterative approach

Final Answer:

Quick Check: