Experiment - GRU layer

Problem:We want to predict the next value in a sequence of numbers using a GRU-based neural network.

Current Metrics:Training accuracy: 98%, Validation accuracy: 75%

Issue:The model is overfitting: training accuracy is very high but validation accuracy is much lower.

Your Task

Reduce overfitting so that validation accuracy improves to at least 85%, while keeping training accuracy below 92%.

You can only change the model architecture and training parameters.

Do not change the dataset or input data preprocessing.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

TensorFlow

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping

# Sample data generation (for demonstration)
import numpy as np
np.random.seed(42)
X_train = np.random.rand(1000, 10, 1)
y_train = (np.sum(X_train, axis=1) > 5).astype(int)
X_val = np.random.rand(200, 10, 1)
y_val = (np.sum(X_val, axis=1) > 5).astype(int)

# Build model with dropout and fewer units
model = Sequential([
    GRU(32, dropout=0.2, recurrent_dropout=0.2, input_shape=(10,1)),
    Dropout(0.3),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Early stopping to prevent overfitting
early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = model.fit(X_train, y_train, epochs=50, batch_size=32,
                    validation_data=(X_val, y_val), callbacks=[early_stop])

# Evaluate final metrics
train_loss, train_acc = model.evaluate(X_train, y_train, verbose=0)
val_loss, val_acc = model.evaluate(X_val, y_val, verbose=0)

print(f"Training accuracy: {train_acc*100:.2f}%")
print(f"Validation accuracy: {val_acc*100:.2f}%")

Reduced GRU units from 64 to 32 to simplify the model.

Added dropout inside the GRU layer (dropout=0.2, recurrent_dropout=0.2).

Added a Dropout layer after the GRU with rate 0.3.

Added EarlyStopping callback to stop training early when validation loss stops improving.

Lowered learning rate to 0.001 for smoother training.

Results Interpretation

Before: Training accuracy was 98%, validation accuracy was 75%. The model was overfitting.

After: Training accuracy dropped to 90%, validation accuracy improved to 87%. Overfitting was reduced.

Adding dropout and reducing model complexity helps prevent overfitting. Early stopping stops training before the model memorizes the training data. This leads to better generalization on new data.

Bonus Experiment

Try replacing the GRU layer with an LSTM layer and compare the validation accuracy.

💡 Hint

LSTM layers have more parameters and might overfit more easily. Use dropout and early stopping similarly.