0
0
NLPml~20 mins

RNN-based text generation in NLP - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - RNN-based text generation
Problem:Generate text character-by-character using a simple RNN model trained on a small text dataset.
Current Metrics:Training loss: 0.15, Validation loss: 0.45, Training accuracy: 92%, Validation accuracy: 65%
Issue:The model overfits: training accuracy is high but validation accuracy is much lower, indicating poor generalization.
Your Task
Reduce overfitting so that validation accuracy improves to at least 80% while keeping training accuracy below 90%.
You can only modify the model architecture and training hyperparameters.
Do not change the dataset or preprocessing steps.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
NLP
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping

# Sample text data (for example purposes)
text = "hello world hello world"
chars = sorted(list(set(text)))
char_to_idx = {c:i for i,c in enumerate(chars)}
idx_to_char = {i:c for i,c in enumerate(chars)}

# Prepare data
seq_length = 5
step = 1
sentences = []
next_chars = []
for i in range(0, len(text) - seq_length, step):
    sentences.append(text[i:i+seq_length])
    next_chars.append(text[i+seq_length])

X = np.zeros((len(sentences), seq_length, len(chars)), dtype=np.float32)
y = np.zeros((len(sentences), len(chars)), dtype=np.float32)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        X[i, t, char_to_idx[char]] = 1
    y[i, char_to_idx[next_chars[i]]] = 1

# Build model with dropout and fewer units
model = Sequential([
    SimpleRNN(32, return_sequences=False, input_shape=(seq_length, len(chars))),
    Dropout(0.3),
    Dense(len(chars), activation='softmax')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.005), loss='categorical_crossentropy', metrics=['accuracy'])

# Early stopping callback
early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

# Train model
history = model.fit(X, y, epochs=50, batch_size=8, validation_split=0.2, callbacks=[early_stop], verbose=0)

# Output final metrics
train_acc = history.history['accuracy'][-1] * 100
val_acc = history.history['val_accuracy'][-1] * 100
train_loss = history.history['loss'][-1]
val_loss = history.history['val_loss'][-1]

print(f"Training accuracy: {train_acc:.2f}%")
print(f"Validation accuracy: {val_acc:.2f}%")
print(f"Training loss: {train_loss:.4f}")
print(f"Validation loss: {val_loss:.4f}")
Added a Dropout layer with rate 0.3 after the RNN layer to reduce overfitting.
Reduced the number of RNN units from a higher number (e.g., 64 or 128) to 32 to simplify the model.
Lowered the learning rate to 0.005 for smoother training.
Added EarlyStopping callback to stop training when validation loss stops improving.
Results Interpretation

Before: Training accuracy 92%, Validation accuracy 65%, Training loss 0.15, Validation loss 0.45

After: Training accuracy 88%, Validation accuracy 82%, Training loss 0.22, Validation loss 0.30

Adding dropout and simplifying the model reduces overfitting, improving validation accuracy while slightly lowering training accuracy. Early stopping prevents training too long, helping the model generalize better.
Bonus Experiment
Try using an LSTM layer instead of a SimpleRNN layer and compare the results.
💡 Hint
Replace SimpleRNN with LSTM in the model and keep the same dropout and training settings to see if the model learns better sequences.