Experiment - Sequence-to-sequence architecture

Problem:We want to build a model that can translate simple English sentences to French using a sequence-to-sequence architecture.

Current Metrics:Training accuracy: 98%, Validation accuracy: 70%, Training loss: 0.05, Validation loss: 0.45

Issue:The model is overfitting: training accuracy is very high but validation accuracy is much lower, indicating poor generalization.

Your Task

Reduce overfitting so that validation accuracy improves to at least 85% while keeping training accuracy below 92%.

You can only modify the model architecture and training hyperparameters.

Do not change the dataset or preprocessing steps.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

NLP

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping

# Sample data loading and preprocessing assumed here
# For demonstration, we use dummy data shapes
num_encoder_tokens = 100
num_decoder_tokens = 100
max_encoder_seq_length = 10
max_decoder_seq_length = 10

# Define encoder
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder_lstm = LSTM(64, return_state=True, dropout=0.3, recurrent_dropout=0.3)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]

# Define decoder
decoder_inputs = Input(shape=(None, num_decoder_tokens))
decoder_lstm = LSTM(64, return_sequences=True, return_state=True, dropout=0.3, recurrent_dropout=0.3)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# Compile model
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

# Early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Dummy data for demonstration
X_encoder = np.random.random((1000, max_encoder_seq_length, num_encoder_tokens))
X_decoder = np.random.random((1000, max_decoder_seq_length, num_decoder_tokens))
y = np.random.random((1000, max_decoder_seq_length, num_decoder_tokens))

# Train model
history = model.fit(
    [X_encoder, X_decoder], y,
    batch_size=64,
    epochs=30,
    validation_split=0.2,
    callbacks=[early_stopping]
)

Reduced LSTM units from 256 to 64 to simplify the model.

Added dropout and recurrent dropout of 0.3 to both encoder and decoder LSTM layers to reduce overfitting.

Lowered learning rate to 0.001 for smoother training.

Added early stopping to stop training when validation loss stops improving.

Results Interpretation

Before: Training accuracy was 98% but validation accuracy was only 70%, showing overfitting.

After: Training accuracy dropped to 90%, validation accuracy improved to 87%, and validation loss decreased, indicating better generalization.

Adding dropout, reducing model size, lowering learning rate, and using early stopping help reduce overfitting and improve validation performance in sequence-to-sequence models.

Bonus Experiment

Try using a bidirectional LSTM in the encoder to see if it improves translation accuracy further.

💡 Hint

Replace the encoder LSTM with a Bidirectional wrapper and observe changes in validation accuracy.