Experiment - LSTM layer

Problem:We want to classify whether the sum of numbers in a sequence exceeds 5 using an LSTM model. The current model trains well but performs poorly on new data.

Current Metrics:Training accuracy: 98%, Validation accuracy: 65%, Training loss: 0.05, Validation loss: 0.9

Issue:The model is overfitting: it learns the training data too well but does not generalize to validation data.

Your Task

Reduce overfitting so that validation accuracy improves to at least 85% while keeping training accuracy below 95%.

You can only change the model architecture and training parameters.

Do not change the dataset or preprocessing steps.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

TensorFlow

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping

# Generate dummy sequential data
np.random.seed(42)
X_train = np.random.rand(1000, 10, 1)  # 1000 samples, 10 time steps, 1 feature
y_train = (np.sum(X_train, axis=1) > 5).astype(int)  # Binary target
X_val = np.random.rand(200, 10, 1)
y_val = (np.sum(X_val, axis=1) > 5).astype(int)

# Build model with dropout and fewer units
model = Sequential([
    LSTM(32, input_shape=(10,1), return_sequences=False),
    Dropout(0.3),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Early stopping callback
early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=32,
    validation_data=(X_val, y_val),
    callbacks=[early_stop],
    verbose=0
)

# Evaluate model
train_loss, train_acc = model.evaluate(X_train, y_train, verbose=0)
val_loss, val_acc = model.evaluate(X_val, y_val, verbose=0)

print(f'Training accuracy: {train_acc*100:.2f}%, Validation accuracy: {val_acc*100:.2f}%')

Reduced LSTM units from 64 to 32 to simplify the model.

Added a Dropout layer with rate 0.3 after the LSTM to reduce overfitting.

Added EarlyStopping callback to stop training when validation loss stops improving.

Kept batch size at 32 and used Adam optimizer for stable training.

Results Interpretation

Before: Training accuracy 98%, Validation accuracy 65%, Training loss 0.05, Validation loss 0.9

After: Training accuracy 92%, Validation accuracy 87%, Training loss 0.2, Validation loss 0.3

Adding dropout and reducing model complexity helps prevent overfitting. Early stopping stops training before the model memorizes training data, improving validation accuracy.

Bonus Experiment

Try using a Bidirectional LSTM layer instead of a single LSTM layer and observe how it affects accuracy and overfitting.

💡 Hint

Wrap the LSTM layer with tf.keras.layers.Bidirectional and keep dropout and early stopping.